Treatment of epsilon moves in subset construction

  • Authors:
  • Gertjan van Noord

  • Affiliations:
  • Rijksuniversiteit Groningen

  • Venue:
  • Computational Linguistics - Special issue on finite-state methods in NLP
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper discusses the problem of determinizing finite-state automata containing large numbers of εmoves. Experiments with finite-state approximations of natural language grammars often give rise to very large automata with a very large number of εmoves. The paper identifies and compares a number of subset construction algorithms that treat εmoves. Experiments have been performed which indicate that the algorithms differ considerably in practice, both with respect to the size of the resulting deterministic automaton, and with respect to practical efficiency. Furthermore, the experiments suggest that the average number of εmoves per state can be used to predict which algorithm is likely to be the fastest for a given input automaton.