Perfect phylogeny and haplotype assignment

Authors:
Eran Halperin;Richard M. Karp
Affiliations:
Princeton University, Princeton, NJ;International Computer Science Institute, Berkeley, CA
Venue:
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Year:
2004

Citing 1
Cited 21

Reconstructing the evolutionary history of natural languages

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms

Phylogenetic Super-Networks from Partial Trees

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Optimal, efficient reconstruction of root-unknown phylogenetic networks with constrained and structured recombination

Journal of Computer and System Sciences - Special issue on bioinformatics II
Parameterized enumeration, transversals, and imperfect phylogeny reconstruction

Theoretical Computer Science - Parameterized and exact computation
Haplotyping with missing data via perfect path phylogenies

Discrete Applied Mathematics
Family trio phasing and missing data recovery

International Journal of Bioinformatics Research and Applications
Experimental analysis of a new algorithm for partial haplotype completion

International Journal of Bioinformatics Research and Applications
Boosting Haplotype Inference with Local Search

Constraints
The Undirected Incomplete Perfect Phylogeny Problem

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Influence of Tree Topology Restrictions on the Complexity of Haplotyping with Missing Data

TAMC '09 Proceedings of the 6th Annual Conference on Theory and Applications of Models of Computation
Haplotype Inference Constrained by Plausible Haplotype Data

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Efficient haplotype inference with boolean satisfiability

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Genome-wide compatible SNP intervals and their properties

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Reducing multi-state to binary perfect phylogeny with applications to missing, removable, inserted, and deleted data

WABI'10 Proceedings of the 10th international conference on Algorithms in bioinformatics
Efficiently solvable perfect phylogeny problems on binary and k-state data with missing values

WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
Haplotype Inference Constrained by Plausible Haplotype Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Phylogenetic network inferences through efficient haplotyping

WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
On the complexity of SNP block partitioning under the perfect phylogeny model

WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
A linear-time algorithm for the perfect phylogeny haplotyping (PPH) problem

RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology
Experimental analysis of a new algorithm for partial haplotype completion

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Phasing and missing data recovery in family trios

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Influence of tree topology restrictions on the complexity of haplotyping with missing data

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is concerned with the reconstruction of perfect phylogenies from binary character data with missing values, and related problems of inferring complete haplotypes from haplotypes or genotypes with missing data. In cases where the problems considered are NP-hard we assume a rich data hypothesis under which they become tractable. Natural probabilistic models are introduced for the generation of character vectors, haplotypes or genotypes with missing data, and it is shown that these models support the rich data hypothesis. The principal results include: A near-linear time algorithm for inferring a perfect phylogeny from binary character data (or haplotype data) with missing values, under the rich data hypothesis; A quadratic-time algorithm for inferring a perfect phylogeny from genotype data with missing values with high probability, under certain distributional assumptions; Demonstration that the problems of maximum-likelihood inference of complete haplotypes from partial haplotypes or partial genotypes can be cast as minimum-entropy disjoint set cover problems; In the case where the haplotypes come from a perfect phylogeny, a representation of the set cover problem as minimum-entropy covering of subtrees of a tree by nodes; An exact algorithm for minimum-entropy subtree covering, and demonstration that it runs in polynomial time when the subtrees have small diameter; Demonstration that a simple greedy approximation algorithm solves the minimum-entropy subtree covering problem with relative error tending to zero when the number of partial haplotypes per complete haplotype is large; An asymptotically consistent method of estimating the frequencies of the complete haplotypes in a perfect phylogeny, under an iid model for the distribution of missing data; Computational results on real data demonstrating the effectiveness of a the greedy algorithm for inferring haplotypes from genotypes with missing data, even in the absence of a perfect phylogeny..