Efficient haplotype inference with pseudo-boolean optimization

  • Authors:
  • Ana Graça;João Marques-Silva;Inês Lynce;Arlindo L. Oliveira

  • Affiliations:
  • IST/INESC-ID, Technical University of Lisbon, Portugal;School of Electronics and Computer Science, University of Southampton, UK;IST/INESC-ID, Technical University of Lisbon, Portugal;IST/INESC-ID, Technical University of Lisbon, Portugal

  • Venue:
  • AB'07 Proceedings of the 2nd international conference on Algebraic biology
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Haplotype inference from genotype data is a key computational problem in bioinformatics, since retrieving directly haplotype information from DNA samples is not feasible using existing technology. One of the methods for solving this problem uses the pure parsimony criterion, an approach known as Haplotype Inference by Pure Parsimony (HIPP). Initial work in this area was based on a number of different Integer Linear Programming (ILP) models and branch and bound algorithms. Recent work has shown that the utilization of a Boolean Satisfiability (SAT) formulation and state of the art SAT solvers represents the most efficient approach for solving the HIPP problem. Motivated by the promising results obtained using SAT techniques, this paper investigates the utilization of modern Pseudo-Boolean Optimization (PBO) algorithms for solving the HIPP problem. The paper starts by applying PBO to existing ILP models. The results are promising, and motivate the development of a new PBO model (RPoly) for the HIPP problem, which has a compact representation and eliminates key symmetries. Experimental results indicate that RPoly outperforms the SAT-based approach on most problem instances, being, in general, significantly more efficient.