The power of amnesia: learning probabilistic automata with variable memory length
Machine Learning - Special issue on COLT '94
Improvements in stochastic language modeling
HLT '91 Proceedings of the workshop on Speech and Natural Language
Maximum Likelihood Set for Estimating a Probability Mass Function
Neural Computation
Manual and automatic evaluation of machine translation between European languages
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Improved smoothing for N-gram language models based on ordinary counts
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Entropy-based pruning for phrase-based machine translation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
The recent availability of large corpora for training N-gram language models has shown the utility of models of higher order than just trigrams. In this paper, we investigate methods to control the increase in model size resulting from applying standard methods at higher orders. We introduce significance-based N-gram selection, which not only reduces model size, but also improves perplexity for several smoothing methods, including Katz backoff and absolute discounting. We also show that, when combined with a new smoothing method and a novel variant of weighted-difference pruning, our selection method performs better in the trade-off between model size and perplexity than the best pruning method we found for modified Kneser-Ney smoothing.