A Compression Method for Natural Language Automata

  • Authors:
  • Lamia Tounsi;Béatrice Bouchou;Denis Maurel

  • Affiliations:
  • Université François Rabelais Tours, LI, lamia.tounsi@computing.dcu.ie, {beatrice.bouchou, denis.maurel}@univ-tours.fr;Université François Rabelais Tours, LI, lamia.tounsi@computing.dcu.ie, {beatrice.bouchou, denis.maurel}@univ-tours.fr;Université François Rabelais Tours, LI, lamia.tounsi@computing.dcu.ie, {beatrice.bouchou, denis.maurel}@univ-tours.fr

  • Venue:
  • Proceedings of the 2009 conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop FSMNLP 2008
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper deals with Finite State Automata used in Natural Language Processing to represent very large dictionaries. We present a method for an important operation applied to these automata, the compression with quick access. Our proposal is to factorize subautomata other than those representing common prefixes or suffixes. Our algorithm uses a DAWG of subautomata to iteratively choose the best substructure to factorize. The linear time accepting complexity is kept in the resulting compact automaton. Experiments performed on ten automata are reported.