WABI'09 Proceedings of the 9th international conference on Algorithms in bioinformatics
The binary perfect phylogeny with persistent characters
Theoretical Computer Science
COCOON'07 Proceedings of the 13th annual international conference on Computing and Combinatorics
Hi-index | 3.84 |
Motivation: We explore the problem of constructing near-perfect phylogenies on bi-allelic haplotypes, where the deviation from perfect phylogeny is entirely due to homoplasy events. We present polynomial-time algorithms for restricted versions of the problem. We show that these algorithms can be extended to genotype data, in which case the problem is called the near-perfect phylogeny haplotyping (NPPH) problem. We present a near-optimal algorithm for the H1-NPPH problem, which is to determine if a given set of genotypes admit a phylogeny with a single homoplasy event. The time-complexity of our algorithm for the H1-NPPH problem is O(m2(n + m)), where n is the number of genotypes and m is the number of SNP sites. This is a significant improvement over the earlier O(n4) algorithm. We also introduce generalized versions of the problem. The H(1, q)-NPPH problem is to determine if a given set of genotypes admit a phylogeny with q homoplasy events, so that all the homoplasy events occur in a single site. We present an O(mq+1(n + m)) algorithm for the H(1,q)-NPPH problem. Results: We present results on simulated data, which demonstrate that the accuracy of our algorithm for the H1-NPPH problem is comparable to that of the existing methods, while being orders of magnitude faster. Availability: The implementation of our algorithm for the H1-NPPH problem is available upon request. Contact: rvijaya@cs.ucf.edu