Performance of a SCFG-based language model with training data sets of increasing size

  • Authors:
  • Joan Andreu Sánchez;José Miguel Benedí;Diego Linares

  • Affiliations:
  • Depto. Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Valencia, Spain;Depto. Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Valencia, Spain;Pontificia Universidad Javeriana – Cali, Cali, Colombia

  • Venue:
  • IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, a hybrid language model which combines a word-based n-gram and a category-based Stochastic Context-Free Grammar (SCFG) is evaluated for training data sets of increasing size. Different estimation algorithms for learning SCFGs in General Format and in Chomsky Normal Form are considered. Experiments on the UPenn Treebank corpus are reported. These experiments have been carried out in terms of the test set perplexity and the word error rate in a speech recognition experiment.