Efficiently finding the most parsimonious phylogenetic tree via linear programming

  • Authors:
  • Srinath Sridhar;Fumei Lam;Guy E. Blelloch;R. Ravi;Russell Schwartz

  • Affiliations:
  • Computer Science Department, Carnegie Mellon University;Tepper School of Business, Carnegie Mellon University;Computer Science Department, Carnegie Mellon University;Tepper School of Business, Carnegie Mellon University;Department of Biological Sciences, Carnegie Mellon University

  • Venue:
  • ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reconstruction of phylogenetic trees is a fundamental problem in computational biology. While excellent heuristic methods are available for many variants of this problem, new advances in phylogeny inference will be required if we are to be able to continue to make effective use of the rapidly growing stores of variation data now being gathered. In this paper, we introduce an integer linear programming formulation to find the most parsimonious phylogenetic tree from a set of binary variation data. The method uses a flow-based formulation that could use exponential numbers of variables and constraints in the worst case. The method has, however, proved extremely efficient in practice on datasets that are well beyond the reach of the available provably efficient methods. The program solves several large mtDNA and Y-chromosome instances within a few seconds, giving provably optimal results in times competitive with fast heuristics than cannot guarantee optimality.