Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Orchestrating Quartets: Approximation and Data Correction
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Note: Computational complexity of some restricted instances of 3-SAT
Discrete Applied Mathematics
Bioinformatics
The gene evolution model and computing its associated probabilities
Journal of the ACM (JACM)
Theoretical Computer Science
Reconstructing approximate phylogenetic trees from quartet samples
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
The Impact of Multiple Protein Sequence Alignment on Phylogenetic Estimation
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
When reconstructing a phylogenetic tree, one common representation for a species is a binary string indicating the existence of some selected genes/proteins. Up until now, all existing methods have assumed the existence of these genes/proteins to be independent. However, in most cases, this assumption is not valid. In this paper, we consider the reconstruction problem by taking into account the dependency of proteins, i.e. protein linkage. We assume that the tree structure and leaf sequences are given, so we need only to find an optimal assignment to the ancestral nodes. We prove that the Phylogenetic Tree Reconstruction with Protein Linkage (PTRPL) problem for three different versions of linkage distance is NP-complete. We provide an efficient dynamic programming algorithm to solve the general problem in O (4m ·n )4 and O (4m ·(m +n )) time (compared to the straight-forward O (4m ·m ·n ) and O (4m ·m 2 ·n ) time algorithm), depending on the versions of linkage distance used, where .. stands for the number of species and .. for the number of proteins, i.e. length of binary string. We also argue, by experiments, that trees with higher accuracy can be constructed by using linkage information than by using only hamming distance to measure the differences between the binary strings, thus validating the significance of linkage information.