On-line algorithms for combining language models

Authors:
A. Kalai;S. Chen;A. Blum;R. Rosenfeld
Affiliations:
Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA;-;-;-
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Year:
1999

Citing 0
Cited 7

Tracking the best linear predictor

The Journal of Machine Learning Research
Efficient algorithms for universal portfolios

The Journal of Machine Learning Research
On prediction using variable order Markov models

Journal of Artificial Intelligence Research
Domain adaptation for statistical classifiers

Journal of Artificial Intelligence Research
Domain-specific disambiguation for typing with ambiguous keyboards

TextEntry '03 Proceedings of the 2003 EACL Workshop on Language Modeling for Text Entry Methods
Switching strategies for sequential decision problems with multiplicative loss with application to portfolios

IEEE Transactions on Signal Processing
Language model mixtures for contextual ad placement in personal blogs

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multiple language models are combined for many tasks in language modeling, such as domain and topic adaptation. In this work, we compare on-line algorithms from machine learning to existing algorithms for combining language models. On-line algorithms developed for this problem have parameters that are updated dynamically to adapt to a data set during evaluation. On-line analysis provides guarantees that these algorithms will perform nearly as well as the best model chosen in hindsight from a large class of models, e.g., the set of all static mixtures. We describe several on-line algorithms and present results comparing these techniques with existing language modeling combination methods on the task of domain adaptation. We demonstrate that, in some situations, on-line techniques can significantly outperform static mixtures (by over 10% in terms of perplexity) and are especially effective when the nature of the test data is unknown or changes over time.