Simple and efficient LZW-Compressed multiple pattern matching

  • Authors:
  • Paweł Gawrychowski

  • Affiliations:
  • Institute of Computer Science, University of Wrocław, Poland,Max-Planck-Institute für Informatik, Saarbrücken, Germany

  • Venue:
  • CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider a natural variant of the classical multiple pattern matching problem: given a Lempel-Ziv-Welch representation of a string $t[1\mathinner{\ldotp\ldotp} N]$ and a collection of (uncompressed) patterns p1,p2,…,pℓ with ∑i|pi|=M, does any of pi occur in t? As shown by Kida et al. [12], extending the single pattern algorithm of Amir, Benson and Farach [2] gives a running time of $\mathcal{O}(n+M^{2})$ for the more general case. We prove that in fact it is possible to achieve $\mathcal{O}(n\log M+M)$ or $\mathcal{O}(n+M^{1+\epsilon})$ complexity. While not linear, running time of our solution matches the single pattern bounds achieved by [2] and [14] in a more structured and unified manner, and without using a lot of combinatorics on words. The only nontrivial components are the suffix array, constant time range minimum queries, and any balanced binary search trees.