Self-organized language modeling for speech recognition
Readings in speech recognition
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A hidden Markov model information retrieval system
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Navigating the Information Superhighway Using Spoken Language Interfaces
IEEE Expert: Intelligent Systems and Their Applications
Distribution of content words and phrases in text and language modelling
Natural Language Engineering
Improving language model size reduction using better pruning criteria
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
Adaptive Chinese word segmentation
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Discriminative pruning of language models for Chinese word segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Text Entry Systems: Mobility, Accessibility, Universality
Text Entry Systems: Mobility, Accessibility, Universality
N-gram weighting: reducing training data mismatch in cross-domain language model estimation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
We propose a distribution-based pruning of n-gram backoff language models. Instead of the conventional approach of pruning n-grams that are infrequent in training data, we prune n-grams that are likely to be infrequent in a new document. Our method is based on the n-gram distribution i.e. the probability that an n-gram occurs in a new document. Experimental results show that our method performed 7--9% (word perplexity reduction) better than conventional cutoff methods.