Efficiently solvable perfect phylogeny problems on binary and k-state data with missing values

Authors:
Kristian Stevens;Bonnie Kirkpatrick
Affiliations:
Computer Science and Evolution and Ecology, University of California Davis;Electrical Engineering and Computer Sciences, University of California Berkeley
Venue:
WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
Year:
2011

Citing 13
Cited 1

Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs

SIAM Journal on Computing
A fast algorithm for reordering sparse matrices for parallel factorization

SIAM Journal on Scientific and Statistical Computing
A Polynomial-Time Algorithm for the Perfect Phylogeny Problem when the Number of Character States is Fixed

SIAM Journal on Computing
A fast algorithm for the computation and enumeration of perfect phylogenies when the number of character states is fixed

Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
Tree compatibility and inferring evolutionary history

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Perfect phylogeny and haplotype assignment

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Incomplete Directed Perfect Phylogeny

SIAM Journal on Computing
Algorithmic Graph Theory and Perfect Graphs (Annals of Discrete Mathematics, Vol 57)

Algorithmic Graph Theory and Perfect Graphs (Annals of Discrete Mathematics, Vol 57)
Efficient whole-genome association mapping using local phylogenies for unphased genotype data

Bioinformatics
The Undirected Incomplete Perfect Phylogeny Problem

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Exact Computation of Coalescent Likelihood under the Infinite Sites Model

ISBRA '09 Proceedings of the 5th International Symposium on Bioinformatics Research and Applications
The Multi-State Perfect Phylogeny Problem with Missing and Removable Data: Solutions via Integer-Programming and Chordal Graph Theory

RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
Integer programming formulations and computations solving phylogenetic and population genetic problems with missing or genotypic data

COCOON'07 Proceedings of the 13th annual international conference on Computing and Combinatorics

Reducing problems in unrooted tree compatibility to restricted triangulations of intersection graphs

WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The perfect phylogeny problem is of central importance to both evolutionary biology and population genetics. Missing values are a common occurrence in both sequence and genotype data. In their presence, the problem of finding a perfect phylogeny is NP-hard, even for binary characters [24]. We extend the utility of the perfect phylogeny by introducing new efficient algorithms for broad classes of binary and multi-state data with missing values. Specifically, we address the rich data hypothesis introduced by Halperin and Karp [11] for the binary perfect phylogeny problem with missing data. We give an efficient algorithm for enumerating phylogenies compatible with characters satisfying the rich data hypothesis. This algorithm is useful for computing the probability of data with missing values under the coalescent model. In addition, we use the partition intersection (PI) graph and chordal graph theory to generalize the rich data hypothesis to multi-state characters with missing values. For a bounded number of states, k, we provide a fixed parameter tractable algorithm for the k-state perfect phylogeny problem with missing data. Our approach reduces missing data problems to problems on complete data. Finally, we characterize a commonly observed condition, an m-clique in the PI graph, under which a perfect phylogeny can be found efficiently for binary characters with missing values. We evaluate our results with extensive empirical analysis using two biologically motivated generative models of character data.