Faster Approximate String Matching for Short Patterns

  • Authors:
  • Philip Bille

  • Affiliations:
  • Technical University of Denmark, 2800, Kgs. Lyngby, Denmark

  • Venue:
  • Theory of Computing Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the classical approximate string matching problem, that is, given strings P and Q and an error threshold k, find all ending positions of substrings of Q whose edit distance to P is at most k. Let P and Q have lengths m and n, respectively. On a standard unit-cost word RAM with word size w≥log n we present an algorithm using time $$O\biggl(nk \cdot \min\biggl(\frac{\log^2 m}{\log n},\frac{\log^2 m\log w}{w}\biggr) + n\biggr)$$ When P is short, namely, $m = 2^{o(\sqrt{\log n}\,)}$ or $m =2^{o(\sqrt{w/\log w}\,)}$ this improves the previously best known time bounds for the problem. The result is achieved using a novel implementation of the Landau-Vishkin algorithm based on tabulation and word-level parallelism.