Efficient learning of context-free grammars from positive structural examples
Information and Computation
Estimating DNA sequence entropy
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
A Corpus for the Evaluation of Lossless Compression Algorithms
DCC '97 Proceedings of the Conference on Data Compression
Data Compression Using Long Common Strings
DCC '99 Proceedings of the Conference on Data Compression
Offline Dictionary-Based Compression
DCC '99 Proceedings of the Conference on Data Compression
Application of Lempel--Ziv factorization to the approximation of grammar-based compression
Theoretical Computer Science
Unsupervised language acquisition
Unsupervised language acquisition
The unsupervised learning of natural language structure
The unsupervised learning of natural language structure
MicroRNA target detection and analysis for genes related to breast cancer using MDLcompress
EURASIP Journal on Bioinformatics and Systems Biology
Identifying hierarchical structure in sequences: a linear-time algorithm
Journal of Artificial Intelligence Research
Grammar-based codes: a new class of universal lossless source codes
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Data-driven computational linguistics at FaMAF-UNC, Argentina
YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Searching for smallest grammars on large sequences and application to DNA
Journal of Discrete Algorithms
Parameter reduction and automata evaluation for grammar-compressed trees
Journal of Computer and System Sciences
An effective heuristic for the smallest grammar problem
Proceedings of the 15th annual conference on Genetic and evolutionary computation
Hi-index | 0.00 |
The smallest grammar problem - namely, finding a smallest context-free grammar that generates exactly one sequence - is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose to focus on the choice of the occurrences to be rewritten by non-terminals. We extend classical offline algorithms by introducing a global optimization of this choice at each step of the algorithm. This approach allows us to define the search space of a smallest grammar by separating the choice of the non-terminals and the choice of their occurrences. We propose a second algorithm that performs a broader exploration by allowing the removal of useless words that were chosen previously. Experiments on a classical benchmark show that our algorithms consistently find smaller grammars then state-of-the-art algorithms.