Sequence Length Requirement of Distance-Based Phylogeny Reconstruction: Breaking the Polynomial Barrier

  • Authors:
  • Sébastien Roch

  • Affiliations:
  • -

  • Venue:
  • FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a new distance-based phylogeny reconstruction technique which provably achieves, at sufficiently short branch lengths, a sequence length requirement growing slower than any polynomial. The technique is based on a new averaging procedure that implicitly reconstructs ancestral sequences.In the same token, we extend previous results on phase transitions in phylogeny reconstruction to general time-reversible models. More precisely, we show that in the so-called Kesten-Stigum zone---roughly, a region of the parameter space where ancestral sequences are well approximated by ``linear combinations'' of observed sequences---sequences of length $e^{\sqrt{\log n}}$ suffice for reconstruction. Here $n$ is the number of extant species. We improve this result to $\poly(\log n)$in the ultrametric case. Surprisingly, this last result suggests that a UPGMA-type algorithm may in some sense be ``optimal'' under a molecular clock.Our results challenge---to some extent---the conventional wisdom that estimates of evolutionary distances alone carry significantly less information about phylogenies than full sequence datasets.