Approximating the smallest grammar: Kolmogorov complexity in natural models

Authors:
Moses Charikar;Eric Lehman;Ding Liu;Rina Panigrahy;Manoj Prabhakaran;April Rasala;Amit Sahai;abhi shelat
Affiliations:
Princeton University;MIT;Princeton University;Cisco Systems;Princeton University;MIT;Princeton University;MIT
Venue:
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Year:
2002

Citing 10
Cited 13

Elements of information theory

Elements of information theory
Approximation algorithms for grammar-based compression

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Offline Dictionary-Based Compression

DCC '99 Proceedings of the Conference on Data Compression
A Unifying Framework for Compressed Pattern Matching

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Some Theory and Practice of Greedy Off-Line Textual Substitution

DCC '98 Proceedings of the Conference on Data Compression
Data compression: methods and complexity issues.

Data compression: methods and complexity issues.
Unsupervised language acquisition

Unsupervised language acquisition
Grammar-based codes: a new class of universal lossless source codes

IEEE Transactions on Information Theory
Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform. I. Without context models

IEEE Transactions on Information Theory
Universal lossless compression via multilevel pattern matching

IEEE Transactions on Information Theory

Application of Lempel--Ziv factorization to the approximation of grammar-based compression

Theoretical Computer Science
Tradeoffs in metaprogramming

Proceedings of the 2006 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Optimization issues in inverted index-based entity annotation

Proceedings of the 3rd international conference on Scalable information systems
Nonlinear dimensionality reduction by locally linear inlaying

IEEE Transactions on Neural Networks
A fully linear-time approximation algorithm for grammar-based compression

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Finite state complexity

Theoretical Computer Science
Pattern matching in lempel-Ziv compressed strings: fast, simple, and deterministic

ESA'11 Proceedings of the 19th European conference on Algorithms
One-dimensional staged self-assembly

DNA'11 Proceedings of the 17th international conference on DNA computing and molecular programming
Efficient memory representation of XML documents

DBPL'05 Proceedings of the 10th international conference on Database Programming Languages
An efficient pattern matching algorithm on a subclass of context free grammars

DLT'04 Proceedings of the 8th international conference on Developments in Language Theory
Improving time and space complexity for compressed pattern matching

ISAAC'06 Proceedings of the 17th international conference on Algorithms and Computation
Improved grammar-based compressed indexes

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
One-dimensional staged self-assembly

Natural Computing: an international journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of finding the smallest context-free grammar that generates exactly one given string of length n. The size of this grammar is of theoretical interest as an efficiently computable variant of Kolmogorov complexity. The problem is of practical importance in areas such as data compression and pattern extraction.The smallest grammar is known to be hard to approximate to within a constant factor, and an o(logn/log logn) approximation would require progress on a long-standing algebraic problem [10]. Previously, the best proved approximation ratio was O(n1/2) for the Bisection algorithm [8]. Our main result is an exponential improvement of this ratio; we give an O(log (n/g*)) approximation algorithm, where g* is the size of the smallest grammar.We then consider other computable variants of Kolomogorov complexity. In particular we give an O(log2 n) approximation for the smallest non-deterministic finite automaton with advice that produces a given string. We also apply our techniques to "advice-grammars" and "edit-grammars", two other natural models of string complexity.