Space-efficient construction of LZ-index

  • Authors:
  • Diego Arroyuelo;Gonzalo Navarro

  • Affiliations:
  • Dept. of Computer Science, University of Chile, Santiago, Chile;Dept. of Computer Science, University of Chile, Santiago, Chile

  • Venue:
  • ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

A compressed full-text self-index is a data structure that replaces a text and in addition gives indexed access to it, while taking space proportional to the compressed text size. The LZ-index, in particular, requires 4uHk(1+o(1)) bits of space, where u is the text length in characters and Hk is its k-th order empirical entropy. Although in practice the LZ-index needs 1.0-1.5 times the text size, its construction requires much more main memory (around 5 times the text size), which limits its applicability to large texts. In this paper we present a practical space-efficient algorithm to construct LZ-index, requiring (4+ε)uHk+o(u) bits of space, for any constant 0εO(σu) time, being σ the alphabet size. Our experimental results show that our method is efficient in practice, needing an amount of memory close to that of the final index.