Formulations and hardness of multiple sorting by reversals
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
SIAM Journal on Computing
Steps toward accurate reconstructions of phylogenies from gene-order data
Journal of Computer and System Sciences - Computational biology 2002
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
On the Practical Solution of the Reversal Median Problem
WABI '01 Proceedings of the First International Workshop on Algorithms in Bioinformatics
Finding an Optimal Inversion Median: Experimental Results
WABI '01 Proceedings of the First International Workshop on Algorithms in Bioinformatics
Genome Rearrangement by Reversals and Insertions/Deletions of Contiguous Segments
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
GESTALT: Genomic Steiner Alignments
CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
Phylogenetic Reconstruction from Arbitrary Gene-Order Data
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Improving tree search in phylogenetic reconstruction from genome rearrangement data
WEA'07 Proceedings of the 6th international conference on Experimental algorithms
Lower bounds for maximum parsimony with gene order data
RCG'05 Proceedings of the 2005 international conference on Comparative Genomics
Quartet-based phylogeny reconstruction from gene orders
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Hi-index | 0.00 |
Phylogenetic reconstruction from gene rearrangements has attracted increasing attention from biologists and computer scientists over the last few years. Methods used in reconstruction include distance-based methods, parsimony methods using sequence-based encodings, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach, but has been limited to small genomes because the running time of its scoring algorithm grows exponentially with the number of genes in the genome. We report here on a new method to compute a tight lower bound on the score of a given tree, using a set of linear constraints generated through selective applications of the triangle inequality (in the spirit of GESTALT). Our method generates an integer linear program with a carefully limited number of constraints, rapidly solves its relaxed version, and uses the result to provide a tight lower bound. Since this bound is very close to the optimal tree score, it can be used directly as a selection criterion, thereby enabling us to bypass entirely the expensive scoring procedure. We have implemented this method within our GRAPPA software and run several series of experiments on both biological and simulated datasets to assess its accuracy. Our results show that using the bound as a selection criterion yields excellent trees, with error rates below 5% up to very large evolutionary distances, consistently beating the baseline Neighbor-Joining. Our new method enables us to extend the range of applicability of the direct optimization method to chromosomes of size comparable to those of bacteria, as well as to datasets with complex combinations of evolutionary events.