On smoothing techniques for bigram-based natural language modelling
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
The estimation of powerful language models from small and large corpora
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Computational Linguistics
Statistical behavior analysis of smoothing methods for language models of mandarin data sets
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
A similarity-based approach to data sparseness problem of chinese language modeling
MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
Transforming graph data for statistical relational learning
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Training corpora for stochastic language models are virtually always too small for maximum-likelihood estimation, so smoothing the models is of great importance. This paper derives the cooccurrence smoothing technique for stochastic language modeling and gives experimental evidence for its validity. Using word-bigram language models, cooccurrence smoothing improved the test-set perplexity by 14% on a Gennan 100,000-word text corpus and by 10% on an English 1-million word corpus.