Fast algorithms for finding nearest common ancestors
SIAM Journal on Computing
On reconstructing species trees from gene trees in term of duplications and losses
RECOMB '98 Proceedings of the second annual international conference on Computational molecular biology
New algorithms for the duplication-loss model
RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Concurrency and Computation: Practice & Experience - Third IEEE International Workshop on High Performance Computational Biology (HiCOMB 2004)
Reconstruction of large phylogenetic trees: A parallel approach
Computational Biology and Chemistry
Reconciling gene trees to a species tree
CIAC'03 Proceedings of the 5th Italian conference on Algorithms and complexity
Heuristics for the gene-duplication problem: a Θ(n) speed-up for the local search
RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
An Ω(n2/ log n) speed-up of TBR heuristics for the gene-duplication problem
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Hi-index | 0.00 |
Phylogenetics is a branch of computational and evolutionary biology dealing with the inference of trees depicting evolutionary relationships among species and/or sequences. An important problem in phylogenetics is to find a species tree that is most parsimonious with a given set of gene trees, which are derived from sequencing multiple gene families from various subsets of species. The gene duplication problem is to compute a species tree that requires the minimum number of gene duplication events to reconciliate with the given set of gene trees. The best known heuristic algorithm for this NP-hard problem is a local optimization technique that runs in O(n^2+kmn) time per search step, where k is the number of gene trees, n is the size of the species tree, and m is the maximum size of a gene tree. In this paper, we present a parallel algorithm for the gene duplication problem that runs in O(n^2+kmnp) time for up to p=O(nklogk) processors. Our algorithm exploits multiple levels of parallelism by parallelizing both the exploration of the search neighborhood and reconciliating of the gene trees with species trees in the neighborhood. Due to the wide variance in the sizes of the gene trees, it is difficult to completely characterize the behavior of the algorithm analytically. We present experimental results on the Blue Gene/L to study both levels of parallelism and how best they should be combined to achieve overall minimum execution time. On a large problem that took about 62.5 h on a 3 GHz Pentium 4, our parallel algorithm ran in 7.7 min on a 1024 node Blue Gene/L.