Simple and efficient LZW-compressed multiple pattern matching

  • Authors:
  • Paweł Gawrychowski

  • Affiliations:
  • Institute of Computer Science, University of Wrocław, Wrocław, Poland and Max-Planck-Institut für Informatik, Saarbrücken, Germany

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider a natural variant of the classical multiple pattern matching problem: given a Lempel-Ziv-Welch representation of a string and a collection of (uncompressed) patterns, does any of them occur in the text? As shown by Kida et al. [15], extending the single pattern algorithm of Amir, Benson and Farach [2] gives a running time of O(n+M^2) for the more general case, where n is the number of codewords in the compressed representation of the text and M is the sum of the length of all patterns. We prove that in fact it is possible to achieve O(nlogM+M) or O(n+M^1^+^@e) complexity. While not linear, running times of our solutions match the single pattern bounds achieved by the previously known solutions [2,17] in a more structured and unified manner, and without using any combinatorics on words. The only nontrivial components of our method are suffix arrays, constant time range minimum queries, and balanced binary search trees.