Improving tree search in phylogenetic reconstruction from genome rearrangement data

Authors:
Fei Ye;Yan Guo;Andrew Lawson;Jijun Tang
Affiliations:
Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC;Department of Computer Science & Engineering, University of South Carolina, Columbia, SC;Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC;Department of Computer Science & Engineering, University of South Carolina, Columbia, SC
Venue:
WEA'07 Proceedings of the 6th international conference on Experimental algorithms
Year:
2007

Citing 8
Cited 0

Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals

STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Formulations and hardness of multiple sorting by reversals

RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Steps toward accurate reconstructions of phylogenies from gene-order data

Journal of Computer and System Sciences - Computational biology 2002
The Median Problem for Breakpoints in Comparative Genomics

COCOON '97 Proceedings of the Third Annual International Conference on Computing and Combinatorics
A Sequential Monte Carlo Method for Bayesian Analysis of Massive Datasets

Data Mining and Knowledge Discovery
Large-scale phylogenetic reconstruction from arbitrary gene-order data

Large-scale phylogenetic reconstruction from arbitrary gene-order data
Quartet-based phylogeny reconstruction from gene orders

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Linear programming for phylogenetic reconstruction based on gene rearrangements

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

A major task in evolutionary biology is to determine the ancestral relationships among the known species, a process generally referred as phylogenetic reconstruction. In the past decade, a new type of data based on genome rearrangements has attracted increasing attention from both biologists and computer scientists. Methods for reconstructing phylogeny based on genome rearrangement data include distance-based methods, direct optimization methods (GRAPPA and MGR), and Markov Chain Monte Carlo (MCMC) methods (Badger). Extensive testing on simulated and biological datasets showed that the latter three methods are currently the best methods for genome rearrangement phylogeny. However, all these tools are dealing with extremely large searching spaces; the total number of possible trees grows exponentially when the number of genomes increases and makes it computationally very expensive. Various heuristics are used to explore the tree space but with no guarantee of optimum being found. In this paper, we present a new method to efficiently search the large tree space. This method is motivated by the concept of particle filtration (also known as Sequential Monte Carlo), which was originally proposed to boost the efficiency of MCMC methods on massive data. We tested and compared this new method on simulated datasets in different scenarios. The results show that the new method achieves a significant improvement in efficiency, while still retains very high topological accuracy.