Large-scale maximum likelihood-based phylogenetic analysis on the IBM BlueGene/L

Authors:
Michael Ott;Jaroslaw Zola;Alexandros Stamatakis;Srinivas Aluru
Affiliations:
Technical University of Munich;Iowa State University;School of Computer and Communication Sciences;Iowa State University
Venue:
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Year:
2007

Citing 12
Cited 5

Parallel implementation and performance of fastDNAml: a program for maximum likelihood phylogenetic inference

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Optimization of MPI collective communication on BlueGene/L systems

Proceedings of the 19th annual international conference on Supercomputing
Maximum likelihood of evolutionary trees: hardness and approximation

Bioinformatics
RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees

Bioinformatics
pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies

Bioinformatics
RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models

Bioinformatics
Dynamic multigrain parallelization on the cell broadband engine

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell

Journal of VLSI Signal Processing Systems
Large scale genotype–phenotype correlation analysis based on phylogenetic trees

Bioinformatics
Reconstruction of large phylogenetic trees: A parallel approach

Computational Biology and Chemistry
Design and implementation of message-passing services for the Blue Gene/L supercomputer

IBM Journal of Research and Development
RAxML-OMP: an efficient program for phylogenetic inference on SMPs

PaCT'05 Proceedings of the 8th international conference on Parallel Computing Technologies

Exploiting Fine-Grained Parallelism in the Phylogenetic Likelihood Function with MPI, Pthreads, and OpenMP: A Performance Study

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Applying the Stream-Based Computing Model to Design Hardware Accelerators: A Case Study

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Accuracy and performance of single versus double precision arithmetics for maximum likelihood phylogeny reconstruction

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part II
A new hybrid parallel algorithm for mrbayes

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Phylogenetic inference is a grand challenge in Bioinformatics due to immense computational requirements. The increasing popularity of multi-gene alignments in biological studies, which typically provide a stable topological signal due to a more favorable ratio of the number of base pairs to the number of sequences, coupled with rapid accumulation of sequence data in general, poses new challenges for high performance computing. In this paper, we demonstrate how state-of-the-art Maximum Likelihood (ML) programs can be efficiently scaled to the IBM BlueGene/L (BG/L) architecture, by porting RAxML, which is currently among the fastest and most accurate programs for phylogenetic inference under the ML criterion. We simultaneously exploit coarse-grained and fine-grained parallelism that is inherent in every ML-based biological analysis. Performance is assessed using datasets consisting of 212 sequences and 566,470 base pairs, and 2,182 sequences and 51,089 base pairs, respectively. To the best of our knowledge, these are the largest datasets analyzed under ML to date. The capability to analyze such datasets will help to address novel biological questions via phylogenetic analyses. Our experimental results indicate that the fine-grained parallelization scales well up to 1, 024 processors. Moreover, a larger number of processors can be efficiently exploited by a combination of coarse-grained and fine-grained parallelism. Finally, we demonstrate that our parallelization scales equally well on an AMD Opteron cluster with a less favorable network latency to processor speed ratio. We recorded super-linear speedups in several cases due to increased cache efficiency.