Reducing coding redundancy in LZW

Authors:
Gopal Lakhani
Affiliations:
Texas Tech University, Lubbock, TX 79409, United States
Venue:
Information Sciences: an International Journal
Year:
2006

Citing 11
Cited 2

Text compression

Text compression
The data compression book (2nd ed.)

The data compression book (2nd ed.)
An on-line variable-length binary encoding of text

Information Sciences: an International Journal
The Effect of Flexible Parsing for Dynamic Dictionary Based Data Compression

DCC '99 Proceedings of the Conference on Data Compression
A Simple Technique for Bounding the Pointwise Redundancy of the 1978 Lempel-Ziv Algorithm

DCC '99 Proceedings of the Conference on Data Compression
Arithmetic coding revisited

DCC '95 Proceedings of the Conference on Data Compression
Less Redundant Codes for Variable Size Dictionaries

DCC '02 Proceedings of the Data Compression Conference
Improving Binary Coding for Prediction-Based Text Compression

DCC '01 Proceedings of the Data Compression Conference
Data Compression

Data Compression
A Technique for High-Performance Data Compression

Computer
Redundancy of the Lempel-Ziv incremental parsing rule

IEEE Transactions on Information Theory

Evaluation of novelty metrics for sentence-level novelty mining

Information Sciences: an International Journal
A fast dynamic compression scheme for natural language texts

Computers & Mathematics with Applications

Quantified Score

Hi-index	0.07

Visualization

Abstract

LZW is a widely used text compression algorithm. It has been known for a long time that its flat coding method causes redundancy. In this article, we identify three sources of redundancy: (1) Flat coding does not use the code space fully. (2) LZW considers the entire dictionary, including leaf nodes, for computation of codeword size, where as leaf nodes would not be encoded. (3) LZW assigns index to a node much ahead of the time it would use the node for output. For (1), we present an elegant implementation of an approach known as phase-in binary coding. A characteristic of our formulation is that it incurs virtually no overhead and that it can be applied for integer sequence coding, as well. For (2) and (3), we assign double indices to dictionary nodes to identify nodes which should not be considered for coding. Our method generates some extra code symbols. To reduce the coding cost of these symbols, we use arithmetic coding for the final output. Experimental results are given for several benchmark datasets to compare code size reduction of our methods with some published results.