Efficiently Computing Arbitrarily-Sized Robinson-Foulds Distance Matrices

  • Authors:
  • Seung-Jin Sul;Grant Brammer;Tiffani L. Williams

  • Affiliations:
  • Department of Computer Science, Texas A&M University, College Station, USA TX 77843-3112;Department of Computer Science, Texas A&M University, College Station, USA TX 77843-3112;Department of Computer Science, Texas A&M University, College Station, USA TX 77843-3112

  • Venue:
  • WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we introduce the HashRF(p,q) algorithm for computing RF matrices of large binary, evolutionary tree collections. The novelty of our algorithm is that it can be used to compute arbitrarily-sized (p×q) RF matrices without running into physical memory limitations. In this paper, we explore the performance of our HashRF(p,q) approach on 20,000 and 33,306 biological trees of 150 taxa and 567 taxa trees, respectively, collected from a Bayesian analysis. When computing the all-to-all RF matrix, HashRF(p,q) is up to 200 times faster than PAUP* and around 40% faster than HashRF, one of the fastest all-to-all RF algorithms. We show an application of our approach by clustering large RF matrices to improve the resolution rate of consensus trees, a popular approach used by biologists to summarize the results of their phylogenetic analysis. Thus, our HashRF(p,q) algorithm provides scientists with a fast and efficient alternative for understanding the evolutionary relationships among a set of trees.