Cooccurrence smoothing for stochastic language modeling

  • Authors:
  • Ute Essen;Volker Steinbiss

  • Affiliations:
  • Philips GmbH Forschungslaboratorien, Aachen, Aachen, Germany;Philips GmbH Forschungslaboratorien, Aachen, Aachen, Germany

  • Venue:
  • ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

Training corpora for stochastic language models are virtually always too small for maximum-likelihood estimation, so smoothing the models is of great importance. This paper derives the cooccurrence smoothing technique for stochastic language modeling and gives experimental evidence for its validity. Using word-bigram language models, cooccurrence smoothing improved the test-set perplexity by 14% on a Gennan 100,000-word text corpus and by 10% on an English 1-million word corpus.