Efficient algorithms for inverting evolution

Authors:
Martin Farach;Sampath Kannan
Affiliations:
Rutgers Univ., New Brunswick, NJ;Univ. of Pennsylvania, Philadelphia
Venue:
Journal of the ACM (JACM)
Year:
1999

Citing 9
Cited 6

On the learnability of discrete distributions

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Randomized algorithms

Randomized algorithms
Efficient algorithms for inverting evolution

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
A few logs suffice to build (almost) all trees: part II

Theoretical Computer Science
On the approximability of numerical taxonomy (fitting distances by tree metrics)

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Nearly tight bounds on the learnability of evolution

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Evolutionary Trees can be Learned in Polynomial Time in the Two-State General Markov Model

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Nearly tight bounds on the learnability of evolution

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
The number of nucleotide sites needed to accurately reconstructlarge evolutionary trees

The number of nucleotide sites needed to accurately reconstructlarge evolutionary trees

Combining polynomial running time and fast convergence for the disk-covering method

Journal of Computer and System Sciences - Computational biology 2002
Ordinal embeddings of minimum relaxation: general properties, trees, and ultrametrics

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Optimal phylogenetic reconstruction

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Embedding ultrametrics into low-dimensional spaces

Proceedings of the twenty-second annual symposium on Computational geometry
Ordinal embeddings of minimum relaxation: General properties, trees, and ultrametrics

ACM Transactions on Algorithms (TALG)
Approximating the best-fit tree under Lp norms

APPROX'05/RANDOM'05 Proceedings of the 8th international workshop on Approximation, Randomization and Combinatorial Optimization Problems, and Proceedings of the 9th international conference on Randamization and Computation: algorithms and techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Evolution can be mathematically modelled by a stochastic process that operates on the DNA of species. Such models are based on the established theory that the DNA sequences, or genomes, of all extant species have been derived from the genome of the common ancestor of all species by a process of random mutation and natural selection.A stochastic model of evolution can be used to construct phylogenies, or evolutionary trees, for a set of species. Maximum Likelihood Estimation (MLE) methods seek the evolutionary tree which is most likely to have produced the DNA under consideration. While these methods are intellectually satisfying, they have not been widely accepted because of their computational intractability.In this paper, we address the intractability of MLE methods as follows: We introduce a metric on stochastic process models of evolution. We show that this metric is meaningful by proving that in order for any algorithm to distinguish between two stochastic models that are close according to this metric, it needs to be given many observations. We complement this result with a simple and efficient algorithm for inverting the stochastic process of evolution, that is, for building a tree from observations on two-state characters. (We will use the same techniques in a subsequent paper to solve the problem for multistate characters, and hence for building a tree from DNA sequence data.) The tree we build is provably close, in our metric, to the tree generating the data and gets closer as more observations become available.Though there have been many heuristics suggested for the problem of finding good approximations to the most likely tree, our algorithm is the first one with a guaranteed convergence rate, and further, this rate is within a polynomial of the lower-bound rate we establish. Ours is also the first polynomial-time algorithm that is proven to converge at all to the correct tree.