Smoothing methods in maximum entropy language modeling

  • Authors:
  • S. C. Martin;H. Ney;J. Zaplo

  • Affiliations:
  • Lehrstuhl fur Inf., Tech. Hochschule Aachen, Germany;-;-

  • Venue:
  • ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper discusses various aspects of smoothing techniques in maximum entropy language modeling, a topic not sufficiently covered by previous publications. We show: (1) that straightforward maximum entropy models with nested features, e.g. tri-, bi-, and unigrams, result in unsmoothed relative frequencies models; (2) that maximum entropy models with nested features and discounted feature counts approximate backing-off smoothed relative frequencies models with Kneser's (see IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Detroit, MI, vol. I, p.181-84, 1995) advanced marginal backoff distribution; this explains some of the reported success of maximum entropy models in the past; and (3) perplexity results for nested and non-nested features, e.g, trigrams and distance-trigrams, on a 4-million word subset of the Wall Street Journal corpus, showing that the smoothing method has more effect on the perplexity than the method used to combine information.