The number of nucleotide sites needed to accurately reconstructlarge evolutionary trees

  • Authors:
  • Mike Steel;Laszlo A. Szekely;Peter L. Erdos

  • Affiliations:
  • -;-;-

  • Venue:
  • The number of nucleotide sites needed to accurately reconstructlarge evolutionary trees
  • Year:
  • 1996

Quantified Score

Hi-index 0.01

Visualization

Abstract

Biologists seek to reconstruct evolutionary trees for increasing number of species, $n$, from aligned genetic sequences. How fast the sequence length $N$ must grow, as a function of $n$, in order to accurately recover the underlying tree with probability $1-\epsilon$, if the sequences evolve according to simple stochastic models of nucleotide substitution? We show that for a certain model, a reconstruction method exists for which the sequence length $N$ can grow surprisingly slowly with $n$ (sublinearly for a wide range of parameters, and even as a power of $\log n$ in a narrow range, which roughly meets the lower bound from information theory). By contrast a more traditional technique (maximum compatibility) provably requires $N$ to grow faster than linearly in $n$. Our approach is based on a new, and computationally efficient approach for reconstructing phylogenetic trees from aligned DNA sequences.