Less is more: significance-based N-gram selection for smaller, better language models

Authors:
Robert C. Moore;Chris Quirk
Affiliations:
Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Year:
2009

Citing 5
Cited 1

The power of amnesia: learning probabilistic automata with variable memory length

Machine Learning - Special issue on COLT '94
Improvements in stochastic language modeling

HLT '91 Proceedings of the workshop on Speech and Natural Language
Maximum Likelihood Set for Estimating a Probability Mass Function

Neural Computation
Manual and automatic evaluation of machine translation between European languages

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Improved smoothing for N-gram language models based on ordinary counts

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

Entropy-based pruning for phrase-based machine translation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recent availability of large corpora for training N-gram language models has shown the utility of models of higher order than just trigrams. In this paper, we investigate methods to control the increase in model size resulting from applying standard methods at higher orders. We introduce significance-based N-gram selection, which not only reduces model size, but also improves perplexity for several smoothing methods, including Katz backoff and absolute discounting. We also show that, when combined with a new smoothing method and a novel variant of weighted-difference pruning, our selection method performs better in the trade-off between model size and perplexity than the best pruning method we found for modified Kneser-Ney smoothing.