Reducing coding redundancy in LZW

  • Authors:
  • Gopal Lakhani

  • Affiliations:
  • Texas Tech University, Lubbock, TX 79409, United States

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2006

Quantified Score

Hi-index 0.07

Visualization

Abstract

LZW is a widely used text compression algorithm. It has been known for a long time that its flat coding method causes redundancy. In this article, we identify three sources of redundancy: (1) Flat coding does not use the code space fully. (2) LZW considers the entire dictionary, including leaf nodes, for computation of codeword size, where as leaf nodes would not be encoded. (3) LZW assigns index to a node much ahead of the time it would use the node for output. For (1), we present an elegant implementation of an approach known as phase-in binary coding. A characteristic of our formulation is that it incurs virtually no overhead and that it can be applied for integer sequence coding, as well. For (2) and (3), we assign double indices to dictionary nodes to identify nodes which should not be considered for coding. Our method generates some extra code symbols. To reduce the coding cost of these symbols, we use arithmetic coding for the final output. Experimental results are given for several benchmark datasets to compare code size reduction of our methods with some published results.