The complexity of tree automata and XPath on grammar-compressed trees
Theoretical Computer Science - Implementation and application of automata
Efficient memory representation of XML document trees
Information Systems
Context-Sensitive Grammar Transform: Compression and Pattern Matching
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
On the Value of Multiple Read/Write Streams for Data Compression
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
A bisection algorithm for grammar-based compression of ordered trees
Information Processing Letters
Fast and Compact Web Graph Representations
ACM Transactions on the Web (TWEB)
Improved approximation algorithms for minimum AND-circuits problem via k-set cover
Information Processing Letters
Leaf languages and string compression
Information and Computation
Compressed string dictionaries
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Lower bounds for context-free grammars
Information Processing Letters
Natural Language Compression on Edge-Guided text preprocessing
Information Sciences: an International Journal
Scalable detection of frequent substrings by grammar-based compression
DS'11 Proceedings of the 14th international conference on Discovery science
Fast q-gram mining on SLP compressed strings
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Iterative Dictionary Construction for Compression of Large DNA Data Sets
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Approximability of minimum AND-Circuits
SWAT'06 Proceedings of the 10th Scandinavian conference on Algorithm Theory
Random access to grammar-compressed strings
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Querying and embedding compressed texts
MFCS'06 Proceedings of the 31st international conference on Mathematical Foundations of Computer Science
Searching for smallest grammars on large sequences and application to DNA
Journal of Discrete Algorithms
Choosing word occurrences for the smallest grammar problem
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Grammar-based compression in a streaming model
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
A faster grammar-based self-index
LATA'12 Proceedings of the 6th international conference on Language and Automata Theory and Applications
Parameter reduction and automata evaluation for grammar-compressed trees
Journal of Computer and System Sciences
Self-Indexed Grammar-Based Compression
Fundamenta Informaticae
Algorithms and limits for compact plan representations
Journal of Artificial Intelligence Research
Improved grammar-based compressed indexes
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Faster algorithm for computing the edit distance between SLP-Compressed strings
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Grammar precompression speeds up burrows---wheeler compression
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Variable-Length codes for space-efficient grammar-based compression
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Fast q-gram mining on SLP compressed strings
Journal of Discrete Algorithms
ESP-index: A compressed index based on edit-sensitive parsing
Journal of Discrete Algorithms
An effective heuristic for the smallest grammar problem
Proceedings of the 15th annual conference on Genetic and evolutionary computation
Complexity of counting output patterns of logic circuits
CATS '13 Proceedings of the Nineteenth Computing: The Australasian Theory Symposium - Volume 141
Tree compression with top trees
ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I
Fingerprints in compressed strings
WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
XML tree structure compression using RePair
Information Systems
On the value of multiple read/write streams for data compression
Information Theory, Combinatorics, and Search Theory
Finding the smallest binarization of a CFG is NP-hard
Journal of Computer and System Sciences
A quadsection algorithm for grammar-based image compression
Integrated Computer-Aided Engineering - Anniversary Volume: Celebrating 20 Years of Excellence
Guest column: the elusive inapproximability of the TSP
ACM SIGACT News
Hi-index | 754.84 |
This paper addresses the smallest grammar problem: What is the smallest context-free grammar that generates exactly one given string σ? This is a natural question about a fundamental object connected to many fields such as data compression, Kolmogorov complexity, pattern identification, and addition chains. Due to the problem's inherent complexity, our objective is to find an approximation algorithm which finds a small grammar for the input string. We focus attention on the approximation ratio of the algorithm (and implicitly, the worst case behavior) to establish provable performance guarantees and to address shortcomings in the classical measure of redundancy in the literature. Our first results are concern the hardness of approximating the smallest grammar problem. Most notably, we show that every efficient algorithm for the smallest grammar problem has approximation ratio at least 8569/8568 unless P=NP. We then bound approximation ratios for several of the best known grammar-based compression algorithms, including LZ78, B ISECTION, SEQUENTIAL, LONGEST MATCH, GREEDY, and RE-PAIR. Among these, the best upper bound we show is O(n12/). We finish by presenting two novel algorithms with exponentially better ratios of O(log3n) and O(log(n/m*)), where m* is the size of the smallest grammar for that input. The latter algorithm highlights a connection between grammar-based compression and LZ77.