Linear programming for phylogenetic reconstruction based on gene rearrangements

Authors:
Jijun Tang;Bernard M. E. Moret
Affiliations:
Dept. of Computer Science & Engineering, U. of South Carolina, Columbia, SC;Dept. of Computer Science, U. of New Mexico, Albuquerque, NM
Venue:
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Year:
2005

Citing 9
Cited 3

Formulations and hardness of multiple sorting by reversals

RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Finding the k Shortest Paths

SIAM Journal on Computing
Steps toward accurate reconstructions of phylogenies from gene-order data

Journal of Computer and System Sciences - Computational biology 2002
A New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
On the Practical Solution of the Reversal Median Problem

WABI '01 Proceedings of the First International Workshop on Algorithms in Bioinformatics
Finding an Optimal Inversion Median: Experimental Results

WABI '01 Proceedings of the First International Workshop on Algorithms in Bioinformatics
Genome Rearrangement by Reversals and Insertions/Deletions of Contiguous Segments

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
GESTALT: Genomic Steiner Alignments

CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
Phylogenetic Reconstruction from Arbitrary Gene-Order Data

BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering

Improving tree search in phylogenetic reconstruction from genome rearrangement data

WEA'07 Proceedings of the 6th international conference on Experimental algorithms
Lower bounds for maximum parsimony with gene order data

RCG'05 Proceedings of the 2005 international conference on Comparative Genomics
Quartet-based phylogeny reconstruction from gene orders

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Phylogenetic reconstruction from gene rearrangements has attracted increasing attention from biologists and computer scientists over the last few years. Methods used in reconstruction include distance-based methods, parsimony methods using sequence-based encodings, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach, but has been limited to small genomes because the running time of its scoring algorithm grows exponentially with the number of genes in the genome. We report here on a new method to compute a tight lower bound on the score of a given tree, using a set of linear constraints generated through selective applications of the triangle inequality (in the spirit of GESTALT). Our method generates an integer linear program with a carefully limited number of constraints, rapidly solves its relaxed version, and uses the result to provide a tight lower bound. Since this bound is very close to the optimal tree score, it can be used directly as a selection criterion, thereby enabling us to bypass entirely the expensive scoring procedure. We have implemented this method within our GRAPPA software and run several series of experiments on both biological and simulated datasets to assess its accuracy. Our results show that using the bound as a selection criterion yields excellent trees, with error rates below 5% up to very large evolutionary distances, consistently beating the baseline Neighbor-Joining. Our new method enables us to extend the range of applicability of the direct optimization method to chromosomes of size comparable to those of bacteria, as well as to datasets with complex combinations of evolutionary events.