Multiple maxima of likelihood in phylogenetic trees: an analytic approach
RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Learning Belief Networks in the Presence of Missing Values and Hidden Variables
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Solving Large Scale Phylogenetic Problems using DCM2
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Probabilistic hierarchical clustering for biological data
Proceedings of the sixth annual international conference on Computational biology
Hierarchical latent class models for cluster analysis
Eighteenth national conference on Artificial intelligence
Hi-index | 0.00 |
A central task in the study of evolution is the reconstruction of a phylogenetic tree from sequences of current-day taxa. A well supported approach to tree reconstruction performs maximum likelihood (ML) analysis. Unfortunately, searching for the maximum likelihood phylogenetic tree is computationally expensive. In this paper, we describe a new algorithm that uses Structural-EM for learning maximum likelihood trees. This algorithm is similar to the standard EM method for estimating branch lengths, except that during iterations of this algorithms the topology is improved as well as the branch length. The algorithm performs iterations of two steps. In the E-Step, we use the current tree topology and branch lengths to compute expected sufficient statistics, which summarize the data. In the M-Step, we search for a topology that maximizes the likelihood with respect to these expected sufficient statistics. As we show, searching for better topologies inside the M-step can be done efficiently, as opposed to standard search over topologies. We prove that each iteration of this procedure increases the likelihood of the topology, and thus the procedure must converge. We evaluate our new algorithm on both synthetic and real sequence data, and show that it is both dramatically faster and finds more plausible trees than standard search for maximum likelihood phylogenies.