Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity

Authors:
Alexandr Andoni;Robert Krauthgamer;Krzysztof Onak
Affiliations:
-;-;-
Venue:
FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Year:
2010

Citing 0
Cited 6

LCS approximation via embedding into locally non-repetitive strings

Information and Computation
Sublinear Time Algorithms

SIAM Journal on Discrete Mathematics
The smoothed complexity of edit distance

ACM Transactions on Algorithms (TALG)
Efficient communication protocols for deciding edit distance

ESA'12 Proceedings of the 20th Annual European conference on Algorithms
Faster algorithm for computing the edit distance between SLP-Compressed strings

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Homomorphic fingerprints under misalignments: sketching edit and shift distances

Proceedings of the forty-fifth annual ACM symposium on Theory of computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor. For strings of length $n$ and every fixed $\eps0$, the algorithm computes a $(\log n)^{O(1/\eps)}$ approximation in $n^{1+\eps}$ time. This is an {\em exponential} improvement over the previously known approximation factor, $2^{\tilde O(\sqrt{\log n})}$, with a comparable running time [Ostrovsky and Rabani, J. ACM 2007, Andoni and Onak, STOC 2009]. This result arises naturally in the study of a new \emph{asymmetric query} model. In this model, the input consists of two strings $x$ and $y$, and an algorithm can access $y$ in an unrestricted manner, while being charged for querying every symbol of $x$. Indeed, we obtain our main result by designing an algorithm that makes a small number of queries in this model. We then provide a nearly-matching lower bound on the number of queries. Our lower bound is the first to expose hardness of edit distance stemming from the input strings being ``repetitive'', which means that many of their substrings are approximately identical. Consequently, our lower bound provides the first rigorous separation between edit distance and Ulam distance.