Pattern matching in lempel-Ziv compressed strings: fast, simple, and deterministic

  • Authors:
  • Paweł Gawrychowski

  • Affiliations:
  • Institute of Computer Science, University of Wrocław, Wroclaw, Poland

  • Venue:
  • ESA'11 Proceedings of the 19th European conference on Algorithms
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Countless variants of the Lempel-Ziv compression are widely used in many real-life applications. This paper is concerned with a natural modification of the classical pattern matching problem inspired by the popularity of such compression methods: given an uncompressed pattern p[1 .. m] and a Lempel-Ziv representation of a string t[1 .. N], does p occur in t? Farach and Thorup [5] gave a randomized O(nlog2 N/n +m) time solution for this problem, where n is the size of the compressed representation of t. Building on the methods of [3] and [6], we improve their result by developing a faster and fully deterministic O(n log N/n +m) time algorithm with the same space complexity. Note that for highly compressible texts, log N/n might be of order n, so for such inputs the improvement is very significant. A small fragment of our method can be used to give an asymptotically optimal solution for the substring hashing problem considered by Farach and Muthukrishnan [4].