The Computational Hardness of Estimating Edit Distance

  • Authors:
  • Alexandr Andoni;Robert Krauthgamer

  • Affiliations:
  • andoni@mit.edu;robert.krauthgamer@weizmann.ac.il

  • Venue:
  • SIAM Journal on Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We prove the first nontrivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of estimating the edit distance is provably larger than that of Hamming distance. Our lower bound exhibits a trade-off between approximation and communication, asserting, for example, that protocols with $O(1)$ bits of communication can obtain only approximation $\alpha\geq\Omega(\log d/\log\log d)$, where $d$ is the length of the input strings. This case of $O(1)$ communication is of particular importance since it captures constant-size sketches as well as embeddings into spaces like $l_1$ and squared-$l_2$, two prevailing algorithmic approaches for dealing with edit distance. Indeed, the known nontrivial communication upper bounds are all derived from embeddings into $l_1$. By excluding low-communication protocols for edit distance, we rule out a strictly richer class of algorithms than previous results. Furthermore, our lower bound holds not only for strings over a binary alphabet but also for strings that are permutations (aka the Ulam metric). For this case, our bound nearly matches an upper bound known via embedding the Ulam metric into $l_1$. Our proof uses a new technique that relies on Fourier analysis in a rather elementary way.