Choosing word occurrences for the smallest grammar problem

  • Authors:
  • Rafael Carrascosa;François Coste;Matthias Gallé;Gabriel Infante-Lopez

  • Affiliations:
  • Grupo de Procesamiento de Lenguaje Natural, Universidad Nacional de Córdoba, Argentina;Symbiose Project, IRISA/INRIA Rennes-Bretagne Atlantique, France;Symbiose Project, IRISA/INRIA Rennes-Bretagne Atlantique, France;Grupo de Procesamiento de Lenguaje Natural, Universidad Nacional de Córdoba, Argentina

  • Venue:
  • LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The smallest grammar problem - namely, finding a smallest context-free grammar that generates exactly one sequence - is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose to focus on the choice of the occurrences to be rewritten by non-terminals. We extend classical offline algorithms by introducing a global optimization of this choice at each step of the algorithm. This approach allows us to define the search space of a smallest grammar by separating the choice of the non-terminals and the choice of their occurrences. We propose a second algorithm that performs a broader exploration by allowing the removal of useless words that were chosen previously. Experiments on a classical benchmark show that our algorithms consistently find smaller grammars then state-of-the-art algorithms.