Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform. I. Without context models

Authors:
En-Hui Yang;J. C. Kieffer
Affiliations:
Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont.;-
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 17

Estimating DNA sequence entropy

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Approximating the smallest grammar: Kolmogorov complexity in natural models

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Approximation algorithms for grammar-based compression

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Collage system: a unifying framework for compressed pattern matching

Theoretical Computer Science - Selected papers in honour of Setsuo Arikawa
Applications of YK Algorithm to the Internet Transmission of Web-Data: Implementation Issues and Modifications

DCC '00 Proceedings of the Conference on Data Compression
Architecture for Efficient Implementation of the YK Lossless Data Compression Algorithm

DCC '01 Proceedings of the Data Compression Conference
Lossless Compression for Satellite Packet Networks Using the YK Algorithm

DCC '01 Proceedings of the Data Compression Conference
Data Coding by Linear Forms of Numerical Sequences

Cybernetics and Systems Analysis
SGA: A grammar-based alignment algorithm

Computer Methods and Programs in Biomedicine
Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition

The Journal of Machine Learning Research
A fully linear-time approximation algorithm for grammar-based compression

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Interactive encoding and decoding for one way learning: near lossless recovery with side information at the decoder

IEEE Transactions on Information Theory
Random access to grammar-compressed strings

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Improving time and space complexity for compressed pattern matching

ISAAC'06 Proceedings of the 17th international conference on Algorithms and Computation
ESP-index: A compressed index based on edit-sensitive parsing

Journal of Discrete Algorithms
An effective heuristic for the smallest grammar problem

Proceedings of the 15th annual conference on Genetic and evolutionary computation
Discrete Tomography Data Footprint Reduction via Natural Compression

Fundamenta Informaticae - Strategies for Tomography

Quantified Score

Hi-index	754.90

Visualization

Abstract

A grammar transform is a transformation that converts any data sequence to be compressed into a grammar from which the original data sequence can be fully reconstructed. In a grammar-based code, a data sequence is first converted into a grammar by a grammar transform and then losslessly encoded. In this paper, a greedy grammar transform is first presented; this grammar transform constructs sequentially a sequence of irreducible grammars from which the original data sequence can be recovered incrementally. Based on this grammar transform, three universal lossless data compression algorithms, a sequential algorithm, an improved sequential algorithm, and a hierarchical algorithm, are then developed. These algorithms combine the power of arithmetic coding with that of string matching. It is shown that these algorithms are all universal in the sense that they can achieve asymptotically the entropy rate of any stationary, ergodic source. Moreover, it is proved that their worst case redundancies among all individual sequences of length n are upper-bounded by c log log n/log n, where c is a constant. Simulation results show that the proposed algorithms outperform the Unix Compress and Gzip algorithms, which are based on LZ78 and LZ77, respectively