Estimating DNA sequence entropy
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Approximating the smallest grammar: Kolmogorov complexity in natural models
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Approximation algorithms for grammar-based compression
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Collage system: a unifying framework for compressed pattern matching
Theoretical Computer Science - Selected papers in honour of Setsuo Arikawa
DCC '00 Proceedings of the Conference on Data Compression
Architecture for Efficient Implementation of the YK Lossless Data Compression Algorithm
DCC '01 Proceedings of the Data Compression Conference
Lossless Compression for Satellite Packet Networks Using the YK Algorithm
DCC '01 Proceedings of the Data Compression Conference
Data Coding by Linear Forms of Numerical Sequences
Cybernetics and Systems Analysis
SGA: A grammar-based alignment algorithm
Computer Methods and Programs in Biomedicine
Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition
The Journal of Machine Learning Research
A fully linear-time approximation algorithm for grammar-based compression
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
IEEE Transactions on Information Theory
Random access to grammar-compressed strings
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Improving time and space complexity for compressed pattern matching
ISAAC'06 Proceedings of the 17th international conference on Algorithms and Computation
ESP-index: A compressed index based on edit-sensitive parsing
Journal of Discrete Algorithms
An effective heuristic for the smallest grammar problem
Proceedings of the 15th annual conference on Genetic and evolutionary computation
Discrete Tomography Data Footprint Reduction via Natural Compression
Fundamenta Informaticae - Strategies for Tomography
Hi-index | 754.90 |
A grammar transform is a transformation that converts any data sequence to be compressed into a grammar from which the original data sequence can be fully reconstructed. In a grammar-based code, a data sequence is first converted into a grammar by a grammar transform and then losslessly encoded. In this paper, a greedy grammar transform is first presented; this grammar transform constructs sequentially a sequence of irreducible grammars from which the original data sequence can be recovered incrementally. Based on this grammar transform, three universal lossless data compression algorithms, a sequential algorithm, an improved sequential algorithm, and a hierarchical algorithm, are then developed. These algorithms combine the power of arithmetic coding with that of string matching. It is shown that these algorithms are all universal in the sense that they can achieve asymptotically the entropy rate of any stationary, ergodic source. Moreover, it is proved that their worst case redundancies among all individual sequences of length n are upper-bounded by c log log n/log n, where c is a constant. Simulation results show that the proposed algorithms outperform the Unix Compress and Gzip algorithms, which are based on LZ78 and LZ77, respectively