Inside-outside reestimation from partially bracketed corpora

  • Authors:
  • Fernando Pereira;Yves Schabes

  • Affiliations:
  • AT&T Bell Laboratories, Murray Hill, NJ;University of Pennsylvania, Philadelphia, PA

  • Venue:
  • HLT '91 Proceedings of the workshop on Speech and Natural Language
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

The inside-outside algorithm for inferring the parameters of a stochastic context-free grammar is extended to take advantage of constituent information in a partially parsed corpus. Experiments on formal and natural language parsed corpora show that the new algorithm can achieve faster convergence and better modelling of hierarchical structure than the original one. In particular, over 90% of the constituents in the most likely analyses of a test set are compatible with test set constituents for a grammar trained on a corpus of 700 hand-parsed part-of-speech strings for ATIS sentences.