Efficient approximations for learning phylogenetic HMM models from data

Authors:
Vladimir Jojic;Nebojsa Jojic;Chris Meek;Dan Geiger;Adam Siepel;David Haussler;D. Heckerman
Affiliations:
Microsoft Research, Redmond, WA 98052, USA,;Microsoft Research, Redmond, WA 98052, USA,;Microsoft Research, Redmond, WA 98052, USA,;Technion---Israel Institute of Technology Computer Science Department, Haifa 32000, Israel,;Center for Biomolecular Science and Engineering;Center for Biomolecular Science and Engineering;Microsoft Research, Redmond, WA 98052, USA,
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 3

Boosted Bayesian network classifiers

Machine Learning
A variational inference procedure allowing internal structure for overlapping clusters and deterministic constraints

Journal of Artificial Intelligence Research
Variational upper bounds for probabilistic phylogenetic models

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: We consider models useful for learning an evolutionary or phylogenetic tree from data consisting of DNA sequences corresponding to the leaves of the tree. In particular, we consider a general probabilistic model described in Siepel and Haussler that we call the phylogenetic-HMM model which generalizes the classical probabilistic models of Neyman and Felsenstein. Unfortunately, computing the likelihood of phylogenetic-HMM models is intractable. We consider several approximations for computing the likelihood of such models including an approximation introduced in Siepel and Haussler, loopy belief propagation and several variational methods. Results: We demonstrate that, unlike the other approximations, variational methods are accurate and are guaranteed to lower bound the likelihood. In addition, we identify a particular variational approximation to be best---one in which the posterior distribution is variationally approximated using the classic Neyman--Felsenstein model. The application of our best approximation to data from the cystic fibrosis transmembrane conductance regulator gene region across nine eutherian mammals reveals a CpG effect.