Approximate Nearest Neighbor under edit distance via product metrics

  • Authors:
  • Piotr Indyk

  • Affiliations:
  • MIT, Cambridge, MA

  • Venue:
  • SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a data structure for the approximate nearest neighbor problem under edit metric (which is defined as the minimum number of insertions, deletions and character substitutions needed to transform one string into another). For any l ≥ 1 and a set of n strings of length d, the data structure reports a 3l-approximate Nearest Neighbor for any given query string q in O(d) time. The space requirement of this data structure is roughly O(nd1/(l+1)), i.e., strongly subexponential. To our knowledge, this is the first data structure for this problem with both o(n) query time and storage subexponential in d.