Cooccurrence smoothing for stochastic language modeling

Authors:
Ute Essen;Volker Steinbiss
Affiliations:
Philips GmbH Forschungslaboratorien, Aachen, Aachen, Germany;Philips GmbH Forschungslaboratorien, Aachen, Aachen, Germany
Venue:
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Year:
1992

Citing 1
Cited 5

On smoothing techniques for bigram-based natural language modelling

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference

The estimation of powerful language models from small and large corpora

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Half-context language models

Computational Linguistics
Statistical behavior analysis of smoothing methods for language models of mandarin data sets

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
A similarity-based approach to data sparseness problem of chinese language modeling

MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
Transforming graph data for statistical relational learning

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Training corpora for stochastic language models are virtually always too small for maximum-likelihood estimation, so smoothing the models is of great importance. This paper derives the cooccurrence smoothing technique for stochastic language modeling and gives experimental evidence for its validity. Using word-bigram language models, cooccurrence smoothing improved the test-set perplexity by 14% on a Gennan 100,000-word text corpus and by 10% on an English 1-million word corpus.