A simple storage scheme for strings achieving entropy bounds

  • Authors:
  • Paolo Ferragina;Rossano Venturini

  • Affiliations:
  • University of Pisa, Italy;University of Pisa, Italy

  • Venue:
  • SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a storage scheme for a string S[1, n], drawn from an alphabet σ, that requires space close to the κ-th order empirical entropy of S, and allows to retrieve any l-long substring of S in optimal O(1+l/log|∑|n) time. This matches the best known bounds [14, 7], via the use of binary encodings and tables only. We also apply this storage scheme to prove new time vs space trade-offs for compressed self-indexes [5, 12] and the Burrows-Wheeler Transform [2].