Integer programming formulations and computations solving phylogenetic and population genetic problems with missing or genotypic data

  • Authors:
  • Dan Gusfield;Yelena Frid;Dan Brown

  • Affiliations:
  • Department of Computer Science, University of California, Davis;Department of Computer Science, University of California, Davis;David R. Cheriton School of Computer Science, University of Waterloo, Canada

  • Venue:
  • COCOON'07 Proceedings of the 13th annual international conference on Computing and Combinatorics
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several central and well-known combinatorial problems in phylogenetics and population genetics have efficient, elegant solutions when the input is complete or consists of haplotype data, but lack efficient solutions when input is either incomplete, consists of genotype data, or is for problems generalized from decision questions to optimization questions. Unfortunately, in biological applications, these harder problems arise very often. Previous research has shown that integer-linear programming can sometimes be used to solve hard problems in practice on a range of data that is realistic for current biological applications. Here, we describe a set of related integer linear programming (ILP) formulations for several additional problems, most of which are known to be NP-hard. These ILP formulations address either the issue of missing data, or solve Haplotype Inference Problems with objective functions that model more complex biological phenomena than previous formulations. These ILP formulations solve efficiently on data whose composition reflects a range of data of current biological interest. We also assess the biological quality of the ILP solutions: some of the problems, although not all, solve with excellent quality. These results give a practical way to solve instances of some central, hard biological problems, and give practical ways to assess how well certain natural objective functions reflect complex biological phenomena. Perl code to generate the ILPs (for input to CPLEX) is on the web at wwwcsif.cs.ucdavis.edu/˜gusfield.