A lower bound on compression of unknown alphabets

  • Authors:
  • Nikola Jevtić;Alon Orlitsky;Narayana P. Santhanam

  • Affiliations:
  • ECE Department, University of California, San Diego, La Jolla, CA;ECE Department, University of California, San Diego, La Jolla, CA and CSE Department, University of California, San Diego, La Jolla, CA;ECE Department, University of California, San Diego, La Jolla, CA

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2005

Quantified Score

Hi-index 5.23

Visualization

Abstract

Many applications call for universal compression of strings over large, possibly infinite, alphabets. However, it has long been known that the resulting redundancy is infinite even for i.i.d distributions. It was recently shown that the redundancy of the strings' patterns, which abstract the values of the symbols, retaining only their relative precedence, is subliner in the blocklength n, hence the per-symbol redundancy diminishers to zero. In this paper we show that pattern redundancy is at least (1.5log2e)n1/3 bits To do so, we construct a generating function whose coefficients lower bound the redundancy, and use Hayman's saddle-point approximation technique to determine the coefficients' asymptotic behavior.