Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity

  • Authors:
  • Alexandr Andoni;Robert Krauthgamer;Krzysztof Onak

  • Affiliations:
  • -;-;-

  • Venue:
  • FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor. For strings of length $n$ and every fixed $\eps0$, the algorithm computes a $(\log n)^{O(1/\eps)}$ approximation in $n^{1+\eps}$ time. This is an {\em exponential} improvement over the previously known approximation factor, $2^{\tilde O(\sqrt{\log n})}$, with a comparable running time [Ostrovsky and Rabani, J. ACM 2007, Andoni and Onak, STOC 2009]. This result arises naturally in the study of a new \emph{asymmetric query} model. In this model, the input consists of two strings $x$ and $y$, and an algorithm can access $y$ in an unrestricted manner, while being charged for querying every symbol of $x$. Indeed, we obtain our main result by designing an algorithm that makes a small number of queries in this model. We then provide a nearly-matching lower bound on the number of queries. Our lower bound is the first to expose hardness of edit distance stemming from the input strings being ``repetitive'', which means that many of their substrings are approximately identical. Consequently, our lower bound provides the first rigorous separation between edit distance and Ulam distance.