ACM Transactions on Asian Language Information Processing (TALIP)
Using non-extensive entropy for text classification
ICIC'09 Proceedings of the 5th international conference on Emerging intelligent computing technology and applications
Hierarchical finite-state models for speech translation using categorization of phrases
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.00 |
This paper discusses various aspects of smoothing techniques in maximum entropy language modeling, a topic not sufficiently covered by previous publications. We show: (1) that straightforward maximum entropy models with nested features, e.g. tri-, bi-, and unigrams, result in unsmoothed relative frequencies models; (2) that maximum entropy models with nested features and discounted feature counts approximate backing-off smoothed relative frequencies models with Kneser's (see IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Detroit, MI, vol. I, p.181-84, 1995) advanced marginal backoff distribution; this explains some of the reported success of maximum entropy models in the past; and (3) perplexity results for nested and non-nested features, e.g, trigrams and distance-trigrams, on a 4-million word subset of the Wall Street Journal corpus, showing that the smoothing method has more effect on the perplexity than the method used to combine information.