Self-organized language modeling for speech recognition
Readings in speech recognition
Class-based n-gram models of natural language
Computational Linguistics
Improving statistical language model performance with automatically generated word hierarchies
Computational Linguistics
Distributional part-of-speech tagging
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Combining distributional and morphological information for part of speech induction
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Factored language models and generalized parallel backoff
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Shrinking exponential language models
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A word clustering approach for language model-based sentence retrieval in question answering systems
Proceedings of the 18th ACM conference on Information and knowledge management
Phrase classes in two-level language models for ASR
Pattern Analysis & Applications
Optimizing language models for polarity classification
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Type level clustering evaluation: new measures and a POS induction case study
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Class-Based language models for chinese-english parallel corpus
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Hi-index | 0.00 |
Building on earlier work that integrates different factors in language modeling, we view (i) backing off to a shorter history and (ii) class-based generalization as two complementary mechanisms of using a larger equivalence class for prediction when the default equivalence class is too small for reliable estimation. This view entails that the classes in a language model should be learned from rare events only and should be preferably applied to rare events. We construct such a model and show that both training on rare events and preferable application to rare events improve perplexity when compared to a simple direct interpolation of class-based with standard language models.