Reconstructing the shape of a tree from observed dissimilarity data
Advances in Applied Mathematics
RECOMB '98 Proceedings of the second annual international conference on Computational molecular biology
A supertree method for rooted trees
Discrete Applied Mathematics
Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor-joining
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
Orchestrating Quartets: Approximation and Data Correction
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Analytic solutions for three taxon ML trees with variable rates across sites
Discrete Applied Mathematics
Hadamard Conjugation for the Kimura 3ST Model: Combinatorial Proof Using Path Sets
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Parametric analysis for ungapped markov models of evolution
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees (Felsenstein, 1981), but finding the global optimum is a hard computational task. Because no general analytic solution is known, numeric techniques such as hill climbing or expectation maximization (EM), are used in order to find optimal parameters for a given tree. So far, analytic solutions were derived only for the simplest model - three taxa, two state characters, under a molecular clock (MC). Quoting Ziheng Yang (2000), who initiated the analytic approach, "this seems to be the simplest case, but has many of the conceptual and statistical complexities involved in phylogenetic estimation".In this work, we give analytic solutions for four taxa, two state characters under a molecular clock. The change from three to four taxa incurs a major increase in the complexity of the underlying algebraic system, and requires novel techniques and approaches. We start by presenting the general maximum likelihood problem on phylogenetic trees as a constrained optimization problem, and the resulting system of polynomial equations. In full generality, it is infeasible to solve this system, therefore specialized tools for the MC case are developed.Four taxa rooted trees have two topologies -- the fork (two subtrees with two leaves each) and the comb (one subtree with three leaves, the other with a single leaf). We combine the ultrametric properties of MC trees with the Hadamard conjugation (Hendy and Penny, 1993) to derive a number of topology dependent identities. Employing these identities, we substantially simplify the system of polynomial equations. We finally use tools from algebraic geometry (e.g. Grobner bases, ideal saturation, resultants) and employ symbolic algebra software to obtain closed form analytic solutions (expressed parametrically in the input data) for the fork topology, and analytic solutions for the comb. We show that in contrast to the fork, the comb has no closed form solutions (expressed by radicals in the input data). In general, four taxa trees can have multiple ML points (Steel, 1994, Chor et. al., 2001). In contrast, we can now prove that under the MC assumption, both the fork and the comb topologies have a unique (local and global) ML point.