Performance of a SCFG-based language model with training data sets of increasing size

Authors:
Joan Andreu Sánchez;José Miguel Benedí;Diego Linares
Affiliations:
Depto. Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Valencia, Spain;Depto. Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Valencia, Spain;Pontificia Universidad Javeriana – Cali, Cali, Colombia
Venue:
IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Year:
2005

Citing 9
Cited 0

An efficient probabilistic context-free parsing algorithm that computes prefix probabilities

Computational Linguistics
Tree-bank Grammars

Tree-bank Grammars
Computation of the probability of initial substring generation by stochastic context-free grammars

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Probabilistic top-down parsing and language modeling

Computational Linguistics
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
A hybrid language model based on a combination of N-grams and stochastic context-free grammars

ACM Transactions on Asian Language Information Processing (TALIP)
Estimation of stochastic context-free grammars and their use as language models

Computer Speech and Language
A Maximum Likelihood Approach to Continuous Speech Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a hybrid language model which combines a word-based n-gram and a category-based Stochastic Context-Free Grammar (SCFG) is evaluated for training data sets of increasing size. Different estimation algorithms for learning SCFGs in General Format and in Chomsky Normal Form are considered. Experiments on the UPenn Treebank corpus are reported. These experiments have been carried out in terms of the test set perplexity and the word error rate in a speech recognition experiment.