Efficient LZ78 factorization of grammar compressed text

  • Authors:
  • Hideo Bannai;Shunsuke Inenaga;Masayuki Takeda

  • Affiliations:
  • Department of Informatics, Kyushu University, Japan;Department of Informatics, Kyushu University, Japan;Department of Informatics, Kyushu University, Japan

  • Venue:
  • SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size n representing a text S of length N, our algorithm computes the LZ78 factorization of T in $O(n\sqrt{N}+m\log N)$ time and $O(n\sqrt{N}+m)$ space, where m is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the $n\sqrt{N}$ term in the time and space complexities becomes either nL, where L is the length of the longest LZ78 factor, or (N−α) where α≥0 is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of S of a certain length. Since m=O(N/logσN) where σ is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when σ is constant, and can be more efficient when the text is compressible, i.e. when m and n are small.