Haplotyping as perfect phylogeny: conceptual framework and efficient solutions
Proceedings of the sixth annual international conference on Computational biology
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Large scale reconstruction of haplotypes from genotype data
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms
INFORMS Journal on Computing
Haplotype inference by pure Parsimony
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Islands of Tractability for Parsimony Haplotyping
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
The phasing of heterozygous traits: Algorithms and complexity
Computers & Mathematics with Applications
Haplotyping for Disease Association: A Combinatorial Approach
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Two-Level ACO for Haplotype Inference Under Pure Parsimony
ANTS '08 Proceedings of the 6th international conference on Ant Colony Optimization and Swarm Intelligence
The Minimum Substring Cover problem
Information and Computation
A Set-Covering Approach with Column Generation for Parsimony Haplotyping
INFORMS Journal on Computing
The minimum substring cover problem
WAOA'07 Proceedings of the 5th international conference on Approximation and online algorithms
Hi-index | 0.00 |
This paper studies haplotype inference by maximum parsimony using population data. We define the optimal haplotype inference (OHI) problem as given a set of genotypes and a set of related haplotypes, find a minimum subset of haplotypes that can resolve all the genotypes. We prove that OHI is NP-hard and can be formulated as an integer quadratic programming (IQP) problem. To solve the IQP problem, we propose an iterative semi-definite programming based approximation algorithm, (called SDPHapInfer). We show that this algorithm finds a solution within a factor of O(logn) of the optimal solution, where n is the number of genotypes. This algorithm has been implemented and tested on a variety of simulated and biological data. In comparison with three other methods: HAPAR, HAPLOTYPER, and PHASE, the experimental results indicate that SDPHapInfer and HAPLOTYPER have similar error rates. In addition, the results generated by PHASE have lower error rates on some data but higher error rates on others. The error rates of HAPAR are higher than the others on biological data. In terms of efficiency, SDPHapInfer, HAPLOTYPER, and PHASE output a solution in a stable and consistent way, and they run much faster than HAPAR when the number of genotypes becomes large.