Analytic solutions for three taxon ML trees with variable rates across sites

  • Authors:
  • Benny Chor;Michael Hendy;David Penny

  • Affiliations:
  • School of Computer Science, Tel-Aviv University, Israel;Institute of Fundamental Sciences and Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand;Institute of Molecular BioSciences and Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand

  • Venue:
  • Discrete Applied Mathematics
  • Year:
  • 2007

Quantified Score

Hi-index 0.05

Visualization

Abstract

We consider the problem of finding the maximum likelihood rooted tree of three species under a molecular clock symmetric model of substitution of 2-state characters. For identically distributed rates per site this is probably the simplest phylogenetic estimation problem, and it is readily solved numerically. Analytic solutions, on the other hand, were obtained only recently by Yang [Complexity of the simplest phylogenetic estimation problem, Proc. Roy Soc. London Ser. B 267 (2000) 109-119]. In this work we provide analytic solutions for any distribution of rates across sites, provided the moment generating function of the distribution is strictly increasing over the negative real numbers. This class of distributions includes, among others, identical rates across sites, as well as the Gamma, the uniform, and the inverse Gaussian distributions. Our work therefore generalizes Yang's solution and our derivation of the analytic solution is substantially simpler. We use the Hadamard conjugation to prove a general statement about the edge lengths of any neighboring pair of leaves in any phylogenetic tree (on three or more taxa). We then employ this relation, in conjunction with the convexity of an entropy-like function, to derive the analytic solution.