Minimum sample risk methods for language modeling

Authors:
Jianfeng Gao;Hao Yu;Wei Yuan;Peng Xu
Affiliations:
Microsoft Research Asia;Shanghai Jiaotong Univ., China;Shanghai Jiaotong Univ., China;John Hopkins Univ.
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 11
Cited 9

The nature of statistical learning theory

The nature of statistical learning theory
Statistical methods for speech recognition

Statistical methods for speech recognition
Numerical Recipes in C: The Art of Scientific Computing

Numerical Recipes in C: The Art of Scientific Computing
Discriminative Reranking for Natural Language Parsing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
An Efficient Boosting Algorithm for Combining Preferences

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Exploiting headword dependency and predictive clustering for language modeling

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Dependency treelet translation: syntactically informed phrasal SMT

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A comparative study on language model adaptation techniques using new evaluation metrics

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach

Computational Linguistics
An empirical study on language model adaptation

ACM Transactions on Asian Language Information Processing (TALIP)
Approximation lasso methods for language modeling

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A comparative study on language model adaptation techniques using new evaluation metrics

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Statistical query translation models for cross-language information retrieval

ACM Transactions on Asian Language Information Processing (TALIP)
The study of a nonstationary maximum entropy Markov model and its application on the pos-tagging task

ACM Transactions on Asian Language Information Processing (TALIP)
Refining generative language models using discriminative learning

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Mining English-Chinese Named Entity Pairs from Comparable Corpora

ACM Transactions on Asian Language Information Processing (TALIP)
Large-scale discriminative language model reranking for voice-search

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a new discriminative training method, called minimum sample risk (MSR), of estimating parameters of language models for text input. While most existing discriminative training methods use a loss function that can be optimized easily but approaches only approximately to the objective of minimum error rate, MSR minimizes the training error directly using a heuristic training procedure. Evaluations on the task of Japanese text input show that MSR can handle a large number of features and training samples; it significantly outperforms a regular trigram model trained using maximum likelihood estimation, and it also outperforms the two widely applied discriminative methods, the boosting and the perceptron algorithms, by a small but statistically significant margin.