A sublinear algorithm for weakly approximating edit distance

  • Authors:
  • Tugkan Batu;Funda Ergün;Joe Kilian;Avner Magen;Sofya Raskhodnikova;Ronitt Rubinfeld;Rahul Sami

  • Affiliations:
  • University of Pennsylvania, Philadelphia, PA;Case Western Reserve University;NEC Laboratories America;University of Toronto, Toronto, ON, CANADA;MIT, Cambridge, MA;NEC Laboratories America;Yale University, New Haven, CT

  • Venue:
  • Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

We show how to determine whether the edit distance between two given strings is small in sublinear time. Specifically, we present a test which, given two n-character strings A and B, runs in time o(n) and with high probability returns "CLOSE" if their edit distance is O(nΑ), and "FAR" if their edit distance is Ω(n), where Α is a fixed parameter less than 1. Our algorithm for testing the edit distance works by recursively subdividing the strings A and B into smaller substrings and looking for pairs of substrings in A, B with small edit distance. To do this, we query both strings at random places using a special technique for economizing on the samples which does not pick the samples independently and provides better query and overall complexity. As a result, our test runs in time Õ(nmax(Α/2, 2Α - 1\)) for any fixed Α Α/2) on the query complexity of every algorithm that distinguishes pairs of strings with edit distance at most nΑ from those with edit distance at least n/6.