Reducing the space requirement of LZ-Index

  • Authors:
  • Diego Arroyuelo;Gonzalo Navarro;Kunihiko Sadakane

  • Affiliations:
  • Dept. of Computer Science, Universidad de Chile;Dept. of Computer Science, Universidad de Chile;Dept. of Computer Science and Communication Engineering, Kyushu University, Japan

  • Venue:
  • CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The LZ-index is a compressed full-text self-index able to represent a text P1...m, over an alphabet of size $\sigma = O(\textrm{polylog}(u))$ and with k-th order empirical entropy Hk(T), using 4uHk(T) + o(ulogσ) bits for any k = o(logσu). It can report all the occ occurrences of a pattern P1...m in T in O(m3logσ + (m + occ)logu) worst case time. Its main drawback is the factor 4 in its space complexity, which makes it larger than other state-of-the-art alternatives. In this paper we present two different approaches to reduce the space requirement of LZ-index. In both cases we achieve (2 + ε)uHk(T) + o(ulogσ) bits of space, for any constant ε 0, and we simultaneously improve the search time to O(m2logm + (m + occ)logu). Both indexes support displaying any subtext of length ℓ in optimal O(ℓ/logσu) time. In addition, we show how the space can be squeezed to (1 + ε)uHk(T) + o(ulogσ) to obtain a structure with O(m2) average search time for $m \geqslant 2\log_\sigma{u}$.