Generalized Shannon Code Minimizes the Maximal Redundancy
LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Precise Average Redundancy Of An Idealized Arithmetic Coding
DCC '02 Proceedings of the Data Compression Conference
Minimum expected length of fixed-to-variable lossless compression of memoryless sources
ISIT'09 Proceedings of the 2009 IEEE international conference on Symposium on Information Theory - Volume 1
Tunstall code, Khodak variations, and random walks
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Hi-index | 754.96 |
We study asymptotically the redundancy of Huffman (and other) codes. It has been known from the inception of the Huffman (1952) code that in the worst case its redundancy-defined as the excess of the code length over the optimal (ideal) code length-is not more than one. However, to the best of our knowledge no precise asymptotic results have been reported in literature thus far. We consider here a memoryless binary source generating a sequence of length n distributed as binomial (n, p) with p being the probability of emitting 0. Based on the results of Stubley (1994), we prove that for p<1/2 the average redundancy R¯nH of the Huffman code becomes as n→∞: R¯nH={(3/2-(1/ln2+o(1))=0.057304…, α irrational); (3/2-(1/M)(〈βMn〉-½)); (-(1/M(1-2-1M/))2-〈nβM〉M/); (+O(ρn), α=N/M rational); where α=log2 (1-p)/p and β=-log2(1-p), ρ<1, M, N are integers such that gcd (N, M)=1, and 〈x〉=x-[x] is the fractional part of x. The appearance of the fractal-like function 〈βMn〉 explains the erratic behavior of the Huffman redundancy, and its “resistance” to succumb to a precise analysis. As a side result, we prove that the average redundancy of the Shannon block code is as n→∞: R¯nS{(½+o(1), α irrational); (½-1/M (〈Mnβ〉-½)); (+O(ρn), α=N/M rational); where ρ<1. Finally, we derive the redundancy of the Golomb (1966) code (for the geometric distribution) which can be viewed as a special case of the Huffman and Shannon codes, Golomb's code redundancy has only oscillating behavior (i.e., there is not convergent mode). These findings are obtained by analytic methods such as theory of distribution of sequences modulo 1 and Fourier series