Note: A simple storage scheme for strings achieving entropy bounds

  • Authors:
  • Paolo Ferragina;Rossano Venturini

  • Affiliations:
  • Dipartimento di Informatica, University of Pisa, Italy;Dipartimento di Informatica, University of Pisa, Italy

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2007

Quantified Score

Hi-index 5.23

Visualization

Abstract

We propose a storage scheme for a string S[1,n], drawn from an alphabet @S, that requires space close to the k-th order empirical entropy of S, and allows one to retrieve any substring of S of length @? in optimal O(1+@?log"|"@S"|n) time. This matches the best known bounds [R. Gonzalez, G. Navarro, Statistical encoding of succinct data structures, in: Procs CPM, in: LNCS, vol. 4009, 2006, pp. 295-306; K. Sadakane, R. Grossi, Squeezing succinct data structures into entropy bounds, in: Procs ACM-SIAM SODA, 2006, pp. 1230-1239], via the use of binary encodings and tables only. We also apply our storage scheme to the Burrows-Wheeler Transform [M. Burrows, D. Wheeler, A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, 1994], and achieve a space bound which depends on both the k-th order entropy of S and the k-th order entropy of its BW-transformed string bwt(S).