Faster fully compressed pattern matching by recompression

  • Authors:
  • Artur Jeż

  • Affiliations:
  • Institute of Computer Science, University of Wrocław, Poland

  • Venue:
  • ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, a fully compressed pattern matching problem is studied. The compression is represented by straight-line programs (SLPs), i.e. a context-free grammar generating exactly one string; the term fully means that both the pattern and the text are given in the compressed form. The problem is approached using a recently developed technique of local recompression: the SLPs are refactored, so that substrings of the pattern and text are encoded in both SLPs in the same way. To this end, the SLPs are locally decompressed and then recompressed in a uniform way. This technique yields an $\mathcal{O}((n+m)\log M \log(n+m))$ algorithm for compressed pattern matching, where n (m) is the size of the compressed representation of the text (pattern, respectively), while M is the size of the decompressed pattern. Since M≤2m, this substantially improves the previously best $\mathcal{O}(m^2n)$ algorithm. Since LZ compression standard reduces to SLP with log( N / n) overhead and in $\mathcal{O}(n \log(N/n))$ time, the presented algorithm can be applied also to the fully LZ-compressed pattern matching problem, yielding an $\mathcal{O}(s \log s \log M)$ running time, where s=n log(N/n)+m log(M/m).