Text compression
The data compression book (2nd ed.)
The data compression book (2nd ed.)
An on-line variable-length binary encoding of text
Information Sciences: an International Journal
The Effect of Flexible Parsing for Dynamic Dictionary Based Data Compression
DCC '99 Proceedings of the Conference on Data Compression
A Simple Technique for Bounding the Pointwise Redundancy of the 1978 Lempel-Ziv Algorithm
DCC '99 Proceedings of the Conference on Data Compression
DCC '95 Proceedings of the Conference on Data Compression
Less Redundant Codes for Variable Size Dictionaries
DCC '02 Proceedings of the Data Compression Conference
Improving Binary Coding for Prediction-Based Text Compression
DCC '01 Proceedings of the Data Compression Conference
Data Compression
Redundancy of the Lempel-Ziv incremental parsing rule
IEEE Transactions on Information Theory
Evaluation of novelty metrics for sentence-level novelty mining
Information Sciences: an International Journal
A fast dynamic compression scheme for natural language texts
Computers & Mathematics with Applications
Hi-index | 0.07 |
LZW is a widely used text compression algorithm. It has been known for a long time that its flat coding method causes redundancy. In this article, we identify three sources of redundancy: (1) Flat coding does not use the code space fully. (2) LZW considers the entire dictionary, including leaf nodes, for computation of codeword size, where as leaf nodes would not be encoded. (3) LZW assigns index to a node much ahead of the time it would use the node for output. For (1), we present an elegant implementation of an approach known as phase-in binary coding. A characteristic of our formulation is that it incurs virtually no overhead and that it can be applied for integer sequence coding, as well. For (2) and (3), we assign double indices to dictionary nodes to identify nodes which should not be considered for coding. Our method generates some extra code symbols. To reduce the coding cost of these symbols, we use arithmetic coding for the final output. Experimental results are given for several benchmark datasets to compare code size reduction of our methods with some published results.