Integer Programming Approaches to Haplotype Inference by Pure Parsimony
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms
INFORMS Journal on Computing
Efficient haplotype inference with boolean satisfiability
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Haplotype inference by pure Parsimony
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Efficient haplotype inference with pseudo-boolean optimization
AB'07 Proceedings of the 2nd international conference on Algebraic biology
SAT in bioinformatics: making the case with haplotype inference
SAT'06 Proceedings of the 9th international conference on Theory and Applications of Satisfiability Testing
Hi-index | 0.00 |
Haplotype inference from genotype data is a key step towards a better understanding of the role played by genetic variations on inherited diseases. One of the most promising approaches uses the pure parsimony criterion. This approach is called Haplotype Inference by Pure Parsimony (HIPP) and is NP-hard as it aims at minimising the number of haplotypes required to explain a given set of genotypes. The HIPP problem is often solved using constraint satisfaction techniques, for which the upper bound on the number of required haplotypes is a key issue. Another very well-known approach is Clark's method, which resolves genotypes by greedily selecting an explaining pair of haplotypes. In this work, we combine the basic idea of Clark's method with a more sophisticated method for the selection of explaining haplotypes, in order to explicitly introduce a bias towards parsimonious explanations. This new algorithm can be used either to obtain an approximated solution to the HIPP problem or to obtain an upper bound on the size of the pure parsimony solution. This upper bound can then used to efficiently encode the problem as a constraint satisfaction problem. The experimental evaluation, conducted using a large set of real and artificially generated examples, shows that the new method is much more effective than Clark's method at obtaining parsimonious solutions, while keeping the advantages of simplicity and speed of Clark's method.