Choosing word occurrences for the smallest grammar problem

Authors:
Rafael Carrascosa;François Coste;Matthias Gallé;Gabriel Infante-Lopez
Affiliations:
Grupo de Procesamiento de Lenguaje Natural, Universidad Nacional de Córdoba, Argentina;Symbiose Project, IRISA/INRIA Rennes-Bretagne Atlantique, France;Symbiose Project, IRISA/INRIA Rennes-Bretagne Atlantique, France;Grupo de Procesamiento de Lenguaje Natural, Universidad Nacional de Córdoba, Argentina
Venue:
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Year:
2010

Citing 12
Cited 4

Efficient learning of context-free grammars from positive structural examples

Information and Computation
Estimating DNA sequence entropy

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
A Corpus for the Evaluation of Lossless Compression Algorithms

DCC '97 Proceedings of the Conference on Data Compression
Data Compression Using Long Common Strings

DCC '99 Proceedings of the Conference on Data Compression
Offline Dictionary-Based Compression

DCC '99 Proceedings of the Conference on Data Compression
Application of Lempel--Ziv factorization to the approximation of grammar-based compression

Theoretical Computer Science
Unsupervised language acquisition

Unsupervised language acquisition
The unsupervised learning of natural language structure

The unsupervised learning of natural language structure
MicroRNA target detection and analysis for genes related to breast cancer using MDLcompress

EURASIP Journal on Bioinformatics and Systems Biology
Identifying hierarchical structure in sequences: a linear-time algorithm

Journal of Artificial Intelligence Research
Grammar-based codes: a new class of universal lossless source codes

IEEE Transactions on Information Theory
The smallest grammar problem

IEEE Transactions on Information Theory

Data-driven computational linguistics at FaMAF-UNC, Argentina

YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Searching for smallest grammars on large sequences and application to DNA

Journal of Discrete Algorithms
Parameter reduction and automata evaluation for grammar-compressed trees

Journal of Computer and System Sciences
An effective heuristic for the smallest grammar problem

Proceedings of the 15th annual conference on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The smallest grammar problem - namely, finding a smallest context-free grammar that generates exactly one sequence - is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose to focus on the choice of the occurrences to be rewritten by non-terminals. We extend classical offline algorithms by introducing a global optimization of this choice at each step of the algorithm. This approach allows us to define the search space of a smallest grammar by separating the choice of the non-terminals and the choice of their occurrences. We propose a second algorithm that performs a broader exploration by allowing the removal of useless words that were chosen previously. Experiments on a classical benchmark show that our algorithms consistently find smaller grammars then state-of-the-art algorithms.