Simple compression code supporting random access and fast string matching

  • Authors:
  • Kimmo Fredriksson;Fedor Nikitin

  • Affiliations:
  • Department of Computer Science and Statistics, University of Joensuu, Joensuu, Finland;Department of Computer Science and Statistics, University of Joensuu, Joensuu, Finland

  • Venue:
  • WEA'07 Proceedings of the 6th international conference on Experimental algorithms
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a sequence S of n symbols over some alphabet Σ, we develop a new compression method that is (i) very simple to implement; (ii) provides O(1) time random access to any symbol of the original sequence; (iii) allows efficient pattern matching over the compressed sequence. Our simplest solution uses at most 2h + o(h) bits of space, where h = n(H0(S)+1), and H0(S) is the zeroth-order empirical entropy of S. We discuss a number of improvements and trade-offs over the basic method. The new method is applied to text compression. We also propose average case optimal string matching algorithms.