RAxML-II: a program for sequential, parallel and distributed inference of large phylogenetic trees: Research Articles

  • Authors:
  • Alexandros Stamatakis;Thomas Ludwig;Harald Meier

  • Affiliations:
  • Technische Universität München, Lehrstuhl für Rechnertechnik und Rechnerorganisation/I10, Boltzmannstrasse 3, D-85748 Garching b. München, Germany;Ruprecht-Karls-Universität Heidelberg, Institut für Informatik, Im Neuenheimer Feld 348, D-69120 Heidelberg, Germany;Technische Universität München, Lehrstuhl für Rechnertechnik und Rechnerorganisation/I10, Boltzmannstrasse 3, D-85748 Garching b. München, Germany

  • Venue:
  • Concurrency and Computation: Practice & Experience - Third IEEE International Workshop on High Performance Computational Biology (HiCOMB 2004)
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Inference of phylogenetic trees comprising hundreds or even thousands of organisms based on the maximum likelihood method is computationally intensive. We present simple heuristics which yield accurate trees for synthetic as well as real data and significantly reduce execution time. Those heuristics have been implemented in a sequential, parallel, and distributed program called RAxML-II, which is freely available as open source code. We compare the performance of the sequential program with PHYML and MrBayes which—to the best of our knowledge—are currently the fastest and most accurate programs for phylogenetic tree inference based on statistical methods. Experiments are conducted using 50 synthetic 100 taxon alignments as well as nine real-world alignments comprising 101 up to 1000 sequences. RAxML-II outperforms MrBayes for real-world data both in terms of speed and final likelihood values. Furthermore, for real data RAxML-II requires less time (a factor of 2–8) than PHYML to reach PHYML's final likelihood values and yields better final trees due to its more exhaustive search strategy. For synthetic data MrBayes is slightly more accurate than RAxML-II and PHYML but significantly slower. The non-deterministic parallel program shows good speedup values and has been used to infer a 10 000-taxon tree comprising organisms from the domains Eukarya, Bacteria, and Archaea. Copyright © 2005 John Wiley & Sons, Ltd.