drag and drop site maker


Critical Assessment of Function Annotation

CAFA is a community-wide challenge designed to provide a large-scale assessment of computational methods dedicated to predicting protein function.
More information can be found at http://biofunctionprediction.org/cafa/ as well as the CAFA2 paper (Jiang et al, 2016)

This toolset provides an assessment for CAFA submissions. 
Several helper programs have also been developed to assist in evaluating predictions

Github repositories are here.

Reconstruction of Ancestral Gene Blocks Using Events

ROAGUE is a tool to reconstruct ancestors of gene blocks in prokaryotic genomes. Gene blocks are genes co-located on the chromosome. In many cases, gene blocks are conserved between bacterial species, sometimes as operons, when genes are co-transcribed. The conservation is rarely absolute: gene loss, gain, duplication, block splitting and block fusion are frequently observed.

Github repository is here.

Bacteriocin Prediction using Word Embedding with Deep Recurrent Neural Networks

Antibiotic resistance is a major public health crisis, and finding new sources of antimicrobial drugs is crucial to solving it. Bacteriocins, which are bacterially-produced antimicrobial peptide products, are candidates for broadening our pool of antimicrobials. The discovery of new bacteriocins by genomic mining is hampered by their sequences' low complexity and high variance, which frustrates sequence similarity-based searches. Here we use word embeddings of protein sequences to represent bacteriocins, and subsequently apply Recurrent Neural Networks and Support Vector Machines to predict novel bacteriocins from protein sequences without using sequence similarity. We developed a word embedding method that accounts for sequence order, providing a better classification than a simple summation of the same word embeddings. We use the Uniprot/TrEMBL database to acquire the word embeddings taking advantage of a large volume of unlabeled data. Our method predicts, with a high probability, six yet unknown putative bacteriocins in Lactobacillus. Generalized, the representation of sequences with word embeddings preserving sequence order information can be applied to protein classification problems for which sequence homology cannot be used.

The associated paper can be found here.

Github repository is here.


SwiftOrtho is orthomcl-like tool. It identifies orthologs, paralogs and co-orthologs for genomes

Github repository is here.

A Pipeline for Operon Evaluation in Metagenomes

POEM is a pipeline which predicts operons from metagenomic data, identifies core functions from predicted operons and visualizes the results.

Github repository is here.


Authorator helps you manage a large number of authors in your manuscript. If you write in LaTeX, that is.

Github repository is here.