A scalable parallelization of the gene duplication problem

Authors:
André Wehe;Wen-Chieh Chang;Oliver Eulenstein;Srinivas Aluru
Affiliations:
Department of Electrical and Computer Engineering, Iowa State University, United States and Department of Computer Science, Iowa State University, United States;Department of Computer Science, Iowa State University, United States;Department of Computer Science, Iowa State University, United States;Department of Electrical and Computer Engineering, Iowa State University, United States
Venue:
Journal of Parallel and Distributed Computing
Year:
2010

Citing 9
Cited 0

Fast algorithms for finding nearest common ancestors

SIAM Journal on Computing
On reconstructing species trees from gene trees in term of duplications and losses

RECOMB '98 Proceedings of the second annual international conference on Computational molecular biology
New algorithms for the duplication-loss model

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
The LCA Problem Revisited

LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
RAxML-II: a program for sequential, parallel and distributed inference of large phylogenetic trees: Research Articles

Concurrency and Computation: Practice & Experience - Third IEEE International Workshop on High Performance Computational Biology (HiCOMB 2004)
Reconstruction of large phylogenetic trees: A parallel approach

Computational Biology and Chemistry
Reconciling gene trees to a species tree

CIAC'03 Proceedings of the 5th Italian conference on Algorithms and complexity
Heuristics for the gene-duplication problem: a Θ(n) speed-up for the local search

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
An Ω(n2/ log n) speed-up of TBR heuristics for the gene-duplication problem

WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Phylogenetics is a branch of computational and evolutionary biology dealing with the inference of trees depicting evolutionary relationships among species and/or sequences. An important problem in phylogenetics is to find a species tree that is most parsimonious with a given set of gene trees, which are derived from sequencing multiple gene families from various subsets of species. The gene duplication problem is to compute a species tree that requires the minimum number of gene duplication events to reconciliate with the given set of gene trees. The best known heuristic algorithm for this NP-hard problem is a local optimization technique that runs in O(n^2+kmn) time per search step, where k is the number of gene trees, n is the size of the species tree, and m is the maximum size of a gene tree. In this paper, we present a parallel algorithm for the gene duplication problem that runs in O(n^2+kmnp) time for up to p=O(nklogk) processors. Our algorithm exploits multiple levels of parallelism by parallelizing both the exploration of the search neighborhood and reconciliating of the gene trees with species trees in the neighborhood. Due to the wide variance in the sizes of the gene trees, it is difficult to completely characterize the behavior of the algorithm analytically. We present experimental results on the Blue Gene/L to study both levels of parallelism and how best they should be combined to achieve overall minimum execution time. On a large problem that took about 62.5 h on a 3 GHz Pentium 4, our parallel algorithm ran in 7.7 min on a 1024 node Blue Gene/L.