Haplotype inference by pure Parsimony

Authors:
Dan Gusfield
Affiliations:
Computer Science Department, University of California, Davis, Davis, CA
Venue:
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Year:
2003

Citing 0
Cited 58

A note on the single genotype resolution problem

Journal of Computer Science and Technology
An approximation algorithm for haplotype inference by maximum parsimony

Proceedings of the 2005 ACM symposium on Applied computing
Islands of Tractability for Parsimony Haplotyping

CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Integer Programming Approaches to Haplotype Inference by Pure Parsimony

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Islands of Tractability for Parsimony Haplotyping

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms

INFORMS Journal on Computing
Computational Problems in Noisy SNP and Haplotype Analysis: Block Scores, Block Identification, and Population Stratification

INFORMS Journal on Computing
The phasing of heterozygous traits: Algorithms and complexity

Computers & Mathematics with Applications
Family trio phasing and missing data recovery

International Journal of Bioinformatics Research and Applications
Boosting Haplotype Inference with Local Search

Constraints
Haplotyping for Disease Association: A Combinatorial Approach

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Highly Scalable Genotype Phasing by Entropy Minimization

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Shorelines of Islands of Tractability: Algorithms for Parsimony and Minimum Perfect Phylogeny Haplotyping Problems

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
2SNP: Scalable Phasing Method for Trios and Unrelated Individuals

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Stochastic local search for large-scale instances of the haplotype inference problem by pure parsimony

Journal of Algorithms
Haplotype Inferring Via Galled-Tree Networks Is NP-Complete

COCOON '08 Proceedings of the 14th annual international conference on Computing and Combinatorics
Two-Level ACO for Haplotype Inference Under Pure Parsimony

ANTS '08 Proceedings of the 6th international conference on Ant Colony Optimization and Swarm Intelligence
Mixed Integer Linear Programming for Maximum-Parsimony Phylogeny Inference

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A Set-Covering Approach with Column Generation for Parsimony Haplotyping

INFORMS Journal on Computing
Haplotype inferring via galled-tree networks using a hypergraph covering problem for special genotype matrices

Discrete Applied Mathematics
Pure Parsimony Xor Haplotyping

ISBRA '09 Proceedings of the 5th International Symposium on Bioinformatics Research and Applications
A Decomposition of the Pure Parsimony Haplotyping Problem

ISBRA '09 Proceedings of the 5th International Symposium on Bioinformatics Research and Applications
On the Approximability of Some Haplotyping Problems

AAIM '09 Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management
Haplotype Inference Constrained by Plausible Haplotype Data

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Efficient haplotype inference with boolean satisfiability

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
HAPLO-ASP: Haplotype Inference Using Answer Set Programming

LPNMR '09 Proceedings of the 10th International Conference on Logic Programming and Nonmonotonic Reasoning
Efficient haplotype inference with answer set programming

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Efficient haplotype inference with answer set programming

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
A new preprocessing procedure for the haplotype inference problem

CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Efficiently finding the most parsimonious phylogenetic tree via linear programming

ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
Efficient haplotype inference with pseudo-boolean optimization

AB'07 Proceedings of the 2nd international conference on Algebraic biology
Indexing a dictionary for subset matching queries

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Efficient and tight upper bounds for haplotype inference by pure parsimony using delayed haplotype selection

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Efficient haplotype inference with combined CP and OR techniques

CPAIOR'08 Proceedings of the 5th international conference on Integration of AI and OR techniques in constraint programming for combinatorial optimization problems
A Class Representative Model for Pure Parsimony Haplotyping

INFORMS Journal on Computing
Constructing majority-rule supertrees

WABI'09 Proceedings of the 9th international conference on Algorithms in bioinformatics
CollHaps: A Heuristic Approach to Haplotype Inference by Parsimony

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
SplittingHeirs: inferring haplotypes by optimizing resultant dense graphs

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Phylogeny - and parsimony-based haplotype inference with constraints

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Xor perfect phylogeny haplotyping in pedigrees

ICIC'10 Proceedings of the Advanced intelligent computing theories and applications, and 6th international conference on Intelligent computing
Insights on haplotype inference on large genotype datasets

BSB'10 Proceedings of the Advances in bioinformatics and computational biology, and 5th Brazilian conference on Bioinformatics
On building minimal automaton for subset matching queries

Information Processing Letters
Pure Parsimony Xor Haplotyping

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Haplotype Inference Constrained by Plausible Haplotype Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Phylogenetic network inferences through efficient haplotyping

WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
Beaches of islands of tractability: algorithms for parsimony and minimum perfect phylogeny haplotyping problems

WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
A hidden markov technique for haplotype reconstruction

WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
Phasing and missing data recovery in family trios

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Phylogeny- and parsimony-based haplotype inference with constraints

Information and Computation
SAT in bioinformatics: making the case with haplotype inference

SAT'06 Proceedings of the 9th international conference on Theory and Applications of Satisfiability Testing
Indexing a dictionary for subset matching queries

Algorithms and Applications
Minimum multicolored subgraph problem in multiplex PCR primer set selection and population haplotyping

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Phasing of 2-SNP genotypes based on non-random mating model

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
A faster haplotyping algorithm based on block partition, and greedy ligation strategy

ICIC'11 Proceedings of the 7th international conference on Intelligent Computing: bio-inspired computing and applications
Efficient and accurate haplotype inference by combining parsimony and pedigree information

ANB'10 Proceedings of the 4th international conference on Algebraic and Numeric Biology
Hap-seq: an optimal algorithm for haplotype phasing with imputation using sequencing data

RECOMB'12 Proceedings of the 16th Annual international conference on Research in Computational Molecular Biology
A polynomial case of the parsimony haplotyping problem

Operations Research Letters
Integer programming formulations and computations solving phylogenetic and population genetic problems with missing or genotypic data

COCOON'07 Proceedings of the 13th annual international conference on Computing and Combinatorics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The next high-priority phase of human genomics will involve the development and use of a full Haplotype Map of the human genome [7]. A critical, perhaps dominating, problem in all such efforts is the inference of large-scale SNP-haplotypes from raw genotype SNP data. This is called the Haplotype Inference (HI) problem. Abstractly, input to the HI problem is a set of n strings over a ternary alphabet. A solution is a set of at most 2n strings over the binary alphabet, so that each input string can be "generated" by some pair of the binary strings in the solution. For greatest biological fidelity, a solution should be consistent with, or evaluated by, properties derived from an appropriate genetic model. A natural model, that has been suggested repeatedly is called here the Pure Parsimony model, where the goal is to find a smallest set of binary strings that can generate the n input strings. The problem of finding such a smallest set is called the Pure Parsimony Problem. Unfortunately, the Pure Parsimony problem is NP-hard, and no paper has previously shown how an optimal Pure-parsimony solution can be computed efficiently for problem instances of the size of current biological interest. In this paper, we show how to formulate the Pure-parsimony problem as an integer linear program; we explain how to improve the practicality of the integer programming formulation; and we present the results of extensive experimentation we have done to show the time and memory practicality of the method, and to compare its accuracy against solutions found by the widely used general haplotyping program PHASE. We also formulate and experiment with variations of the Pure-Parsimony criteria, that allow greater practicality. The results are that the Pure Parsimony problem can be solved efficiently in practice for a wide range of problem instances of current interest in biology. Both the time needed for a solution, and the accuracy of the solution, depend on the level of recombination in the input strings. The speed of the solution improves with increasing recombination, but the accuracy of the solution decreases with increasing recombination.