Squeezing succinct data structures into entropy bounds

  • Authors:
  • Kunihiko Sadakane;Roberto Grossi

  • Affiliations:
  • Kyushu University, Higashi-ku, Japan;Università di Pisa, Largo Bruno, Italy

  • Venue:
  • SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

Consider a sequence S of n symbols drawn from an alphabet A = {1, 2,. . .,σ}, stored as a binary string of nlog σ bits. A succinct data structure on S supports a given set of primitive operations on S using just f (n) = o(n log σ) extra bits. We present a technique for transforming succinct data structures (which do not change the binary content of S) into compressed data structures using nHk + f(n) + O(n log σ + log logσ n + k)/ logσ n) bits of space, where Hk ≤ log σ is the kth-order empirical entropy of S. When k + log σ = o(log n), we improve the space complexity of the succinct data structure from n log σ + o(n log σ) to n Hk + o(nlog σ) bits by keeping S in compressed format, so that any substring of O(log σ n) symbols in S (i.e. O(log n) bits) can be decoded on the fly in constant time. Thus, the time complexity of the supported operations does not change asymptotically. Namely, if an operation takes t(n) time in the succinct data structure, it requires O(t(n)) time in the resulting compressed data structure. Using this simple approach we improve the space complexity of some of the best known results on succinct data structures We extend our results to handle another definition of entropy.