Simple Random Access Compression

  • Authors:
  • Kimmo Fredriksson;Fedor Nikitin

  • Affiliations:
  • (Correspd.) Department of Computer Science, University of Kuopio, P.O. Box 1627, 70211 Kuopio, Finland. kimmo.fredriksson@uku.fi;Saint-Petersburg State University, Faculty of Applied Mathematics and Control Processes, Universitetskii prospekt 35, Petergof, Saint-Petersburg, Russia 198504. fedor.nikitin@gmail.com

  • Venue:
  • Fundamenta Informaticae
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a sequence S of n symbols over some alphabet Σ of size σ, we develop new compression methods that are (i) very simple to implement; (ii) provide O(1) time random access to any symbol (or short substring) of the original sequence. Our simplest solution uses at most 2h+o(h) bits of space, where h = n(H$_{0}$(S)+1), and H$_{0}$(S) is the zeroth-order empirical entropy of S. We discuss a number of improvements and trade-offs over the basic method. For example, we can achieve n(H$_{k}$(S)+1)+o(n(H$_{k}$(S)+1)) bits of space, for k = o(log$_{σ}$(n)). Several applications are discussed, including text compression, (compressed) full-text indexing and string matching.