A novel abundance-based algorithm for binning metagenomic sequences using l-tuples

Authors:
Yu-Wei Wu;Yuzhen Ye
Affiliations:
School of Informatics and Computing, Indiana University, Bloomington, IN;School of Informatics and Computing, Indiana University, Bloomington, IN
Venue:
RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Year:
2010

Citing 3
Cited 3

Figaro

Bioinformatics
A Statistical Framework for the Functional Analysis of Metagenomes

RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
CompostBin: a DNA composition-based algorithm for binning environmental shotgun reads

RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology

Separating metagenomic short reads into genomes via clustering

WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
A two-way multi-dimensional mixture model for clustering metagenomic sequences

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
A probabilistic approach to accurate abundance-based binning of metagenomic reads

WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing Among the computational tools recently developed for metagenomic sequence analysis, binning tools attempt to classify all (or most) of the sequences in a metagenomic dataset into different bins (i.e., species), based on various DNA composition patterns (e.g., the tetramer frequencies) of various genomes Composition-based binning methods, however, cannot be used to classify very short fragments, because of the substantial variation of DNA composition patterns within a single genome We developed a novel approach (AbundanceBin) for metagenomics binning by utilizing the different abundances of species living in the same environment AbundanceBin is an application of the Lander-Waterman model to metagenomics, which is based on the l-tuple content of the reads AbundanceBin achieved accurate, unsupervised, clustering of metagenomic sequences into different bins, such that the reads classified in a bin belong to species of identical or very similar abundances in the sample In addition, AbundanceBin gave accurate estimations of species abundances, as well as their genome sizes—two important parameters for characterizing a microbial community We also show that AbundanceBin performed well when the sequence lengths are very short (e.g 75 bp) or have sequencing errors.