Information Processing Letters
Class-based n-gram models of natural language
Computational Linguistics
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Testing the correlation of word error rate and perplexity
Speech Communication
Tutorial on Practical Prediction Theory for Classification
The Journal of Machine Learning Research
Adaptive language modeling using minimum discriminant estimation
HLT '91 Proceedings of the workshop on Speech and Natural Language
Evaluation and extension of maximum entropy models with inequality constraints
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Shrinking exponential language models
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Maximum entropy distribution estimation with generalized regularization
COLT'06 Proceedings of the 19th annual conference on Learning Theory
IEEE Transactions on Information Theory
Shrinking exponential language models
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Practical very large scale CRFs
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Domain adaptation of maximum entropy language models
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Acoustically discriminative language model training with pseudo-hypothesis
Speech Communication
Computational approaches to sentence completion
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A challenge set for advancing language modeling
WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Hi-index | 0.00 |
We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, and perform linear regression to see whether we can model test set performance as a simple function of training set performance and various model statistics. Remarkably, we find a simple relationship that predicts test set performance with a correlation of 0.9997. We analyze why this relationship holds and show that it holds for other exponential language models as well, including class-based models and minimum discrimination information models. Finally, we discuss how this relationship can be applied to improve language model performance.