Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers
Proceedings of the third international workshop on Data and text mining in bioinformatics
MetaCluster: unsupervised binning of environmental genomic fragments and taxonomic annotation
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
SIMCOMP: a hybrid soft clustering of metagenome reads
PRIB'10 Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics
Separating metagenomic short reads into genomes via clustering
WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
A novel abundance-based algorithm for binning metagenomic sequences using l-tuples
RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
A probabilistic approach to accurate abundance-based binning of metagenomic reads
WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Hi-index | 0.00 |
A major hindrance to studies of microbial diversity has been that the vast majority of microbes cannot be cultured in the laboratory and thus are not amenable to traditional methods of characterization. Environmental shotgun sequencing (ESS) overcomes this hurdle by sequencing the DNA from the organisms present in a microbial community. The interpretation of this metagenomic data can be greatly facilitated by associating every sequence read with its source organism. We report the development of CompostBin, a DNA composition-based algorithm for analyzing metagenomic sequence reads and distributing them into taxon-specific bins. Unlike previous methods that seek to bin assembled contigs and often require training on known reference genomes, CompostBin has the ability to accurately bin raw sequence reads without need for assembly or training. CompostBin uses a novel weighted PCA algorithm to project the high dimensional DNA composition data into an informative lower-dimensional space, and then uses the normalized cut clustering algorithm on this filtered data set to classify sequences into taxon-specific bins. We demonstrate the algorithm's accuracy on a variety of low to medium complexity data sets.