A Linear Time Approximation Scheme for Maximum Quartet Consistency on Sparse Sampled Inputs

  • Authors:
  • Sagi Snir;Raphael Yuster

  • Affiliations:
  • ssagi@math.haifa.ac.il;raphy@math.haifa.ac.il

  • Venue:
  • SIAM Journal on Discrete Mathematics
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Phylogenetic tree reconstruction is a fundamental biological problem. Quartet amalgamation—combining a set of trees over four taxa into a tree over the full set—stands at the heart of many phylogenetic reconstruction methods. This task has attracted many theoretical as well as practical works. However, even reconstruction from a consistent set of quartet trees, i.e., all quartets agree with some tree, is NP-hard, and the best approximation ratio known is $1/3$. For a dense input of $\Theta(n^4)$ quartets that are not necessarily consistent, the problem has a polynomial time approximation scheme. When the number of taxa grows, considering such dense inputs is impractical and some sampling approach is imperative. It is known that given a randomly sampled consistent set of quartets from an unknown phylogeny, one can find, in polynomial time and with high probability, a tree satisfying a $0.425$ fraction of them, an improvement over the $1/3$ ratio. In this paper we further show that given a randomly sampled consistent set of quartets from an unknown phylogeny, where the size of the sample is at least $\Theta(n^2 \log n)$, there is a randomized approximation scheme that runs in linear time in the number of quartets. The previously known polynomial approximation scheme for that problem required a very dense sample of size $\Theta(n^4)$. We note that samples of size $\Theta(n^2 \log n)$ are sparse in the full quartet set. The result is obtained by a combinatorial technique that may be of independent interest.