Efficient algorithms for inverting evolution
Journal of the ACM (JACM)
Hi-index | 0.01 |
Biologists seek to reconstruct evolutionary trees for increasing number of species, $n$, from aligned genetic sequences. How fast the sequence length $N$ must grow, as a function of $n$, in order to accurately recover the underlying tree with probability $1-\epsilon$, if the sequences evolve according to simple stochastic models of nucleotide substitution? We show that for a certain model, a reconstruction method exists for which the sequence length $N$ can grow surprisingly slowly with $n$ (sublinearly for a wide range of parameters, and even as a power of $\log n$ in a narrow range, which roughly meets the lower bound from information theory). By contrast a more traditional technique (maximum compatibility) provably requires $N$ to grow faster than linearly in $n$. Our approach is based on a new, and computationally efficient approach for reconstructing phylogenetic trees from aligned DNA sequences.