Shrinking exponential language models

Authors:
Stanley F. Chen
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY
Venue:
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2009

Citing 10
Cited 8

A dynamic language model for speech recognition

HLT '91 Proceedings of the workshop on Speech and Natural Language
Class-based n-gram models of natural language

Computational Linguistics
Task Adaptation Using MAP Estimation in N-Gram Language Modeling

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Adaptive language modeling using minimum discriminant estimation

HLT '91 Proceedings of the workshop on Speech and Natural Language
The design for the wall street journal-based CSR corpus

HLT '91 Proceedings of the workshop on Speech and Natural Language
The SuperARV language model: investigating the effectiveness of tightly integrating multiple knowledge sources

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Evaluation and extension of maximum entropy models with inequality constraints

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Multi-class composite N-gram based on connection direction

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Performance prediction for exponential language models

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
MAP adaptation of stochastic grammars

Computer Speech and Language

Performance prediction for exponential language models

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Domain adaptation of maximum entropy language models

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Learning to transform and select elementary trees for improved syntax-based machine translations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Integrating history-length interpolation and classes in language modeling

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Acoustically discriminative language model training with pseudo-hypothesis

Speech Communication
Leveraging word confusion networks for named entity modeling and detection from conversational telephone speech

Speech Communication
Computational approaches to sentence completion

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A challenge set for advancing language modeling

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT

Quantified Score

Hi-index	0.00

Visualization

Abstract

In (Chen, 2009), we show that for a variety of language models belonging to the exponential family, the test set cross-entropy of a model can be accurately predicted from its training set cross-entropy and its parameter values. In this work, we show how this relationship can be used to motivate two heuristics for "shrinking" the size of a language model to improve its performance. We use the first heuristic to develop a novel class-based language model that outperforms a baseline word trigram model by 28% in perplexity and 1.9% absolute in speech recognition word-error rate on Wall Street Journal data. We use the second heuristic to motivate a regularized version of minimum discrimination information models and show that this method outperforms other techniques for domain adaptation.