Huffman coding with unequal letter costs

  • Authors:
  • Mordecai J. Golin;Claire Kenyon;Neal E. Young

  • Affiliations:
  • Hong Kong UST, Kowloon, Hong Kong;Université Paris-Sud, France;Akamai Technologies, Cambridge, MA

  • Venue:
  • STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

(MATH) In the standard Huffman coding problem, one is given a set of words and for each word a positive frequency. The goal is to encode each word w as a codeword c(w) over a given alphabet. The encoding must be prefix free (no codeword is a prefix of any other) and should minimize the weighted average codeword size &Sgr;w freq w, &124;c(w)&124;. The problem has a well-known polynomial-time algorithm due to Huffman [15].Here we consider the generalization in which the letters of the encoding alphabet may have non-uniform lengths. The goal is to minimize the weighted average codeword length &Sgr;w freq (w) cost(c(w)), where cost s is the sum of the (possibly non-uniform) lengths of the letters in s. Despite much previous work, the problem is not known to be NP-hard, nor was it previously known to have a polynomial-time approximation algorithm. Here we describe a polynomial-time approximation scheme (PTAS) for the problem.