TreeLign: simultaneous stepwise alignment and phylogenetic positioning, with its application to automatic phylogenetic assignment of 16S rRNAs

  • Authors:
  • Yuan Li;Aaron L. Halpern;Shaojie Zhang

  • Affiliations:
  • University of Central Florida, Orlando, FL;J. Craig Venter Institute, Rockville, MD;University of Central Florida, Orlando, FL

  • Venue:
  • Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Phylogenetic assignment of 16s rRNA has been frequently used for taxonomic classification. Recently, high-throughput sequencing, especially in the context of environmental or metagenomic sequencing projects, has made fast and accurate taxonomic classification an important goal. Existing classification methods are either fast, but too coarse-grained and inaccurate or fine-grained and accurate but too slow for use in practice. In this paper, we propose a new computational method, TreeLign, to rapidly and accurately conduct alignment and phylogenetic assignments for novel sequences, given a reference phylogenetic tree and an alignment. TreeLign first constructs profiles of every branch on the reference tree, then, for each query sequence, tries assigning it to every possible branch, and finally obtains a new tree and a new alignment which are jointly optimal in terms of Maximum Parsimony (MP). We tested the accuracy and robustness of TreeLign on both a large and a small 16S rRNA dataset extracted from the core set of GreenGenes. The results on the large dataset show that the assignments of TreeLign are in general consistent with the phylogenetic tree of the core set of GreenGenes. And, the results on the small dataset show that TreeLign achieves comparable accuracy compared with existing maximum likelihood based methods, but requires much less computational time.