Approximate pattern matching with k-mismatches in packed text

  • Authors:
  • Emanuele Giaquinta;Szymon Grabowski;Kimmo Fredriksson

  • Affiliations:
  • Department of Computer Science, University of Helsinki, Finland;Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łód, Poland;School of Computing, University of Eastern Finland, P.O. Box 1627, FI-70211 Kuopio, Finland

  • Venue:
  • Information Processing Letters
  • Year:
  • 2013

Quantified Score

Hi-index 0.89

Visualization

Abstract

Given strings P of length m and T of length n over an alphabet of size @s, the string matching with k-mismatches problem is to find the positions of all the substrings in T that are at Hamming distance at most k from P. If T can be read only one character at the time the best known bounds are O(nklogk) and O(n+nk/wlogk) in the word-RAM model with word length w. In the RAM models (including AC^0 and word-RAM) it is possible to read up to @?w/log@s@? characters in constant time if the characters of T are encoded using @?log@s@? bits. The only solution for k-mismatches in packed text works in O((nlog@s/logn)@?mlog(k+logn/log@s)/w@?+n^@e) time, for any @e0. We present an algorithm that runs in time O(n@?w/(mlog@s)@?(1+logmin(k,@s)logm/log@s)) in the AC^0 model if m=O(w/log@s) and T is given packed. We also describe a simpler variant that runs in time O(n@?w/(mlog@s)@?logmin(m,logw/log@s)) in the word-RAM model. The algorithms improve the existing bound for w=@W(log^1^+^@en), for any @e0. Based on the introduced technique, we present algorithms for several other approximate matching problems.