An empirical study on language model adaptation

Authors:
Jianfeng Gao;Hisami Suzuki;Wei Yuan
Affiliations:
Suzuki Microsoft Research, Redmond, WA;Suzuki Microsoft Research, Redmond, WA;Shanghai Jiao Tong University, Shanghai, China
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2006

Citing 16
Cited 6

Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Similarity-Based Models of Word Cooccurrence Probabilities

Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Machine Learning

Machine Learning
Toward a unified approach to statistical language modeling for Chinese

ACM Transactions on Asian Language Information Processing (TALIP)
Discriminative Reranking for Natural Language Parsing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
An Efficient Boosting Algorithm for Combining Preferences

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Exploiting headword dependency and predictive clustering for language modeling

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Pattern Recognition, Third Edition

Pattern Recognition, Third Edition
Minimum sample risk methods for language modeling

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A comparative study on language model adaptation techniques using new evaluation metrics

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Language model adaptation with MAP estimation and the perceptron algorithm

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers

Word Topic Models for Spoken Document Retrieval and Transcription

ACM Transactions on Asian Language Information Processing (TALIP)
Model adaptation via model interpolation and boosting for web search ranking

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Adapting boosting for information retrieval measures

Information Retrieval
Empirical comparisons of various discriminative language models for speech recognition

ROCLING '11 Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing
The Left and Right Context of a Word: Overlapping Chinese Syllable Word Segmentation with Minimal Context

ACM Transactions on Asian Language Information Processing (TALIP)
Unsupervised language model adaptation for handwritten Chinese text recognition

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article presents an empirical study of four techniques for adapting language models, including a maximum a posteriori (MAP) method and three discriminative training models, in the application of Japanese Kana-Kanji conversion. We compare the performance of these methods from various angles by adapting the baseline model to four adaptation domains. In particular, we attempt to interpret the results in terms of the character error rate (CER) by correlating them with the characteristics of the adaptation domain, measured by using the information-theoretic notion of cross entropy. We show that such a metric correlates well with the CER performance of the adaptation methods, and also show that the discriminative methods are not only superior to a MAP-based method in achieving larger CER reduction, but also in having fewer side effects and being more robust against the similarity between background and adaptation domains.