Estimation of stochastic context-free grammars and their use as language models

Authors:
J. M. Benedí;J. A. Sánchez
Affiliations:
Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Camino de Vera s/n, 46022 Valencia, Spain;Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Camino de Vera s/n, 46022 Valencia, Spain
Venue:
Computer Speech and Language
Year:
2005

Citing 17
Cited 10

Efficient learning of context-free grammars from positive structural examples

Information and Computation
Class-based n-gram models of natural language

Computational Linguistics
Bayesian learning of probabilistic language models

Bayesian learning of probabilistic language models
Building probabilistic models for natural language

Building probabilistic models for natural language
Consistency of Stochastic Context-Free Grammars From Probabilistic Estimation Based on Growth Transformations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical methods for speech recognition

Statistical methods for speech recognition
Combination of Estimation Algorithms and Grammatical Inference Techniques to Learn Stochastic Context-Free Grammars

ICGI '00 Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications
Comparison Between the Inside-Outside Algorithm and the Viterbi Algorithm for Stochastic Context-Free Grammars

SSPR '96 Proceedings of the 6th International Workshop on Advances in Structural and Syntactical Pattern Recognition
Computation of the probability of initial substring generation by stochastic context-free grammars

Computational Linguistics
Probabilistic top-down parsing and language modeling

Computational Linguistics
Statistical properties of probabilistic context-free grammars

Computational Linguistics
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Combination of n-grams and Stochastic Context-Free Grammars for language modeling

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Immediate-head parsing for language models

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A class-based language model for large-vocabulary speech recognition extracted from part-of-speech statistics

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Applying Probability Measures to Abstract Languages

IEEE Transactions on Computers
A Maximum Likelihood Approach to Continuous Speech Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

Fast Stochastic Context-Free Parsing: A Stochastic Version of the Valiant Algorithm

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
Word Segments in Category-Based Language Models for Automatic Speech Recognition

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
Combination of N-Grams and Stochastic Context-Free Grammars in an Offline Handwritten Recognition System

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
Semiring Lattice Parsing Applied to CYK

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
Segment-based classes for language modeling within the field of CSR

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Using finite state models for the integration of hierarchical LMs into ASR systems

MCPR'11 Proceedings of the Third Mexican conference on Pattern recognition
Time reduction of stochastic parsing with stochastic context-free grammars

IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Performance of a SCFG-based language model with training data sets of increasing size

IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
A hybrid approach to statistical language modeling with multilayer perceptrons and unigrams

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
A scalable distributed syntactic, semantic, and lexical language model

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is devoted to the estimation of stochastic context-free grammars (SCFGs) and their use as language models. Classical estimation algorithms, together with new ones that consider a certain subset of derivations in the estimation process, are presented in a unified framework. This set of derivations is chosen according to both structural and statistical criteria. The estimated SCFGs have been used in a new hybrid language model to combine both a word-based n-gram, which is used to capture the local relations between words, and a category-based SCFG together with a word distribution into categories, which is defined to represent the long-term relations between these categories. We describe methods for learning these stochastic models for complex tasks, and we present an algorithm for computing the word transition probability using this hybrid language model. Finally, experiments on the UPenn Treebank corpus show significant improvements in the test set perplexity with regard to the classical word trigram models.