Performance prediction for exponential language models

Authors:
Stanley F. Chen
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY
Venue:
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2009

Citing 11
Cited 7

Occam's razor

Information Processing Letters
Class-based n-gram models of natural language

Computational Linguistics
Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

Machine Learning
PAC-Bayesian model averaging

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Testing the correlation of word error rate and perplexity

Speech Communication
Tutorial on Practical Prediction Theory for Classification

The Journal of Machine Learning Research
Adaptive language modeling using minimum discriminant estimation

HLT '91 Proceedings of the workshop on Speech and Natural Language
Evaluation and extension of maximum entropy models with inequality constraints

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Shrinking exponential language models

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Maximum entropy distribution estimation with generalized regularization

COLT'06 Proceedings of the 19th annual conference on Learning Theory
The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network

IEEE Transactions on Information Theory

Shrinking exponential language models

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Practical very large scale CRFs

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Domain adaptation of maximum entropy language models

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Acoustically discriminative language model training with pseudo-hypothesis

Speech Communication
Leveraging word confusion networks for named entity modeling and detection from conversational telephone speech

Speech Communication
Computational approaches to sentence completion

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A challenge set for advancing language modeling

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, and perform linear regression to see whether we can model test set performance as a simple function of training set performance and various model statistics. Remarkably, we find a simple relationship that predicts test set performance with a correlation of 0.9997. We analyze why this relationship holds and show that it holds for other exponential language models as well, including class-based models and minimum discrimination information models. Finally, we discuss how this relationship can be applied to improve language model performance.