Smoothing methods in maximum entropy language modeling

Authors:
S. C. Martin;H. Ney;J. Zaplo
Affiliations:
Lehrstuhl fur Inf., Tech. Hochschule Aachen, Germany;-;-
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Year:
1999

Citing 0
Cited 4

Maximum Entropy Modeling: A Suitable Framework to Learn Context-Dependent Lexicon Models for Statistical Machine Translation

Machine Learning
The study of a nonstationary maximum entropy Markov model and its application on the pos-tagging task

ACM Transactions on Asian Language Information Processing (TALIP)
Using non-extensive entropy for text classification

ICIC'09 Proceedings of the 5th international conference on Emerging intelligent computing technology and applications
Hierarchical finite-state models for speech translation using categorization of phrases

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper discusses various aspects of smoothing techniques in maximum entropy language modeling, a topic not sufficiently covered by previous publications. We show: (1) that straightforward maximum entropy models with nested features, e.g. tri-, bi-, and unigrams, result in unsmoothed relative frequencies models; (2) that maximum entropy models with nested features and discounted feature counts approximate backing-off smoothed relative frequencies models with Kneser's (see IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Detroit, MI, vol. I, p.181-84, 1995) advanced marginal backoff distribution; this explains some of the reported success of maximum entropy models in the past; and (3) perplexity results for nested and non-nested features, e.g, trigrams and distance-trigrams, on a 4-million word subset of the Wall Street Journal corpus, showing that the smoothing method has more effect on the perplexity than the method used to combine information.