Fast Multi-dimensional Approximate Pattern Matching

  • Authors:
  • Gonzalo Navarro;Ricardo A. Baeza-Yates

  • Affiliations:
  • -;-

  • Venue:
  • CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of approximate string matching in d dimensions, that is, to find a pattern of size md in a text of size nd with at most k md errors (substitutions, insertions and deletions along any dimension). We use a novel and very flexible error model, for which there exists only an algorithm to evaluate the similarity between two elements in two dimensions at O(m4) time. We extend the algorithm to d dimensions, at O(d!m2d) time and O(d!m2d-1) space. We also give the first search algorithm for such model, which is O(d!mdnd) time and O(d!mdnd-1) space. We show how to reduce the space cost to O(d!3dm2d-1) with little time penalty. Finally, we present the first sublinear-time (on average) searching algorithm (i.e. not all text cells are inspected), which is O(knd/md-1) for k m/(d(logσ m- logσ d)))d-1, where σ is the alphabet size. After that error level the filter still remains better than dynamic programming for k ≤ md-1/(d(logσ m - logσ d))(d-1)/d. These are the first search algorithms for the problem. As side-effects we extend to d dimensions an already proposed algorithm for two-dimensional exact string matching, and we obtain a sublinear-time filter to search in d dimensions allowing k mismatches.