Some Theory and Practice of Greedy Off-Line Textual Substitution
DCC '98 Proceedings of the Conference on Data Compression
The Unsupervised Acquisition of a Lexicon from Continuous Speech
The Unsupervised Acquisition of a Lexicon from Continuous Speech
Data compression: methods and complexity issues.
Data compression: methods and complexity issues.
Grammar-based codes: a new class of universal lossless source codes
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Universal lossless compression via multilevel pattern matching
IEEE Transactions on Information Theory
Approximating the smallest grammar: Kolmogorov complexity in natural models
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Application of Lempel-Ziv Factorization to the Approximation of Grammar-Based Compression
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Application of Lempel--Ziv factorization to the approximation of grammar-based compression
Theoretical Computer Science
Compact representations as a search strategy: compression EDAs
Theoretical Computer Science - Foundations of genetic algorithms
Sublinear Algorithms for Approximating String Compressibility
APPROX '07/RANDOM '07 Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization. Algorithms and Techniques
Experiences with model inference assisted fuzzing
WOOT'08 Proceedings of the 2nd conference on USENIX Workshop on offensive technologies
A fully linear-time approximation algorithm for grammar-based compression
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Reverse engineering ECUs of automotive components: a case study
Proceedings of the First International Workshop on Model Inference In Testing
Automatic discovery of unspecified behaviors in automotive control software
TAIC PART'10 Proceedings of the 5th international academic and industrial conference on Testing - practice and research techniques
Theoretical Computer Science
Scalable detection of frequent substrings by grammar-based compression
DS'11 Proceedings of the 14th international conference on Discovery science
Efficient memory representation of XML documents
DBPL'05 Proceedings of the 10th international conference on Database Programming Languages
Bridging lossy and lossless compression by motif pattern discovery
General Theory of Information Transfer and Combinatorics
Improving time and space complexity for compressed pattern matching
ISAAC'06 Proceedings of the 17th international conference on Algorithms and Computation
ESP-index: A compressed index based on edit-sensitive parsing
Journal of Discrete Algorithms
Hi-index | 0.00 |
Several recently-proposed data compression algorithms are based on the idea of representing a string by a context-free grammar. Most of these algorithms are known to be asymptotically optimal with respect to a stationary ergodic source and to achieve a low redundancy rate. However, such results do not reveal how effectively these algorithms exploit the grammar-model itself; that is, are the compressed strings produced as small as possible? We address this issue by analyzing the approximation ratio of several algorithms, that is, the maximum ratio between the size of the generated grammar and the smallest possible grammar over all inputs. On the negative side, we show that every polynomial-time grammar-compression algorithm has approximation ratio at least 8569/8568 unless P = NP. Moreover, achieving an approximation ratio of o(log n/log log n) would require progress on an algebraic problem in a well-studied area. We then upper and lower bound approximation ratios for the following four previously-proposed grammar-based compression algorithms: SEQUENTIAL, BISECTION, GREEDY, and LZ78, each of which employs a distinct approach to compression. These results seem to indicate that there is much room to improve grammar-based compression algorithms.