An efficient probabilistic context-free parsing algorithm that computes prefix probabilities
Computational Linguistics
Statistical methods for speech recognition
Statistical methods for speech recognition
An efficient context-free parsing algorithm
Communications of the ACM
Tree-bank Grammars
Computation of the probability of initial substring generation by stochastic context-free grammars
Computational Linguistics
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Probabilistic top-down parsing and language modeling
Computational Linguistics
Inside-outside reestimation from partially bracketed corpora
ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Combination of n-grams and Stochastic Context-Free Grammars for language modeling
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Immediate-head parsing for language models
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
International Journal of Bioinformatics Research and Applications
Extracting Grammars from RNA Sequences
ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
Statistical and linguistic clustering for language modeling in ASR
CIARP'05 Proceedings of the 10th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis and Applications
Performance of a SCFG-based language model with training data sets of increasing size
IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Hi-index | 0.00 |
In this paper, a hybrid language model is defined as a combination of a word-based n-gram, which is used to capture the local relations between words, and a category-based stochastic context-free grammar (SCFG) with a word distribution into categories, which is defined to represent the long-term relations between these categories. The problem of unsupervised learning of a SCFG in General Format and in Chomsky Normal Form by means of estimation algorithms is studied. Moreover, a bracketed version of the classical estimation algorithm based on the Earley algorithm is proposed. This paper also explores the use of SCFGs obtained from a treebank corpus as initial models for the estimation algorithms. Experiments on the UPenn Treebank corpus are reported. These experiments have been carried out in terms of the test set perplexity and the word error rate in a speech recognition experiment.