Estimation of stochastic context-free grammars and their use as language models

  • Authors:
  • J. M. Benedí;J. A. Sánchez

  • Affiliations:
  • Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Camino de Vera s/n, 46022 Valencia, Spain;Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Camino de Vera s/n, 46022 Valencia, Spain

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper is devoted to the estimation of stochastic context-free grammars (SCFGs) and their use as language models. Classical estimation algorithms, together with new ones that consider a certain subset of derivations in the estimation process, are presented in a unified framework. This set of derivations is chosen according to both structural and statistical criteria. The estimated SCFGs have been used in a new hybrid language model to combine both a word-based n-gram, which is used to capture the local relations between words, and a category-based SCFG together with a word distribution into categories, which is defined to represent the long-term relations between these categories. We describe methods for learning these stochastic models for complex tasks, and we present an algorithm for computing the word transition probability using this hybrid language model. Finally, experiments on the UPenn Treebank corpus show significant improvements in the test set perplexity with regard to the classical word trigram models.