A Metric Index for Approximate String Matching

  • Authors:
  • Edgar Chávez;Gonzalo Navarro

  • Affiliations:
  • -;-

  • Venue:
  • LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings. We build a metric space where the sites are the nodes of the suffix tree of the text, and the approximate query is seen as a proximity query on that metric space. This permits us finding the R occurrences of a pattern of length m in a text of length n in average time O(mlog2 n+m2+R), using O(n log n) space and O(n log2 n) index construction time. This complexity improves by far over all other previous methods. We also show a simpler scheme needing O(n) space.