Communications of the ACM
The Bloomier filter: an efficient data structure for static support lookup tables
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
The Alignment Template Approach to Statistical Machine Translation
Computational Linguistics
A hierarchical phrase-based model for statistical machine translation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Tightly packed tries: how to fit large models into memory, and make them load fast, too
SETQA-NLP '09 Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing
SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
Efficient handling of N-gram language models for statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Joshua: an open source toolkit for parsing-based machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Stream-based randomised language models for SMT
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Storing the web in memory: space efficient language models with constant time retrieval
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
KenLM: faster and smaller language model queries
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
NADA: a robust system for non-referential pronoun detection
DAARC'11 Proceedings of the 8th international conference on Anaphora Processing and Applications
NiuTrans: an open source toolkit for phrase-based and syntax-based machine translation
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Akamon: an open source toolkit for tree/forest-based statistical machine translation
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Large-scale syntactic language modeling with treelets
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A comparative study of target dependency structures for statistical machine translation
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
A beam-search decoder for grammatical error correction
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A systematic comparison of phrase table pruning techniques
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Joshua 4.0: packing, PRO, and paraphrases
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Bagging and Boosting statistical machine translation systems
Artificial Intelligence
From query to question in one click: suggesting synthetic questions to searchers
Proceedings of the 22nd international conference on World Wide Web
Unsupervised language model adaptation for handwritten Chinese text recognition
Pattern Recognition
Hi-index | 0.00 |
N-gram language models are a major resource bottleneck in machine translation. In this paper, we present several language model implementations that are both highly compact and fast to query. Our fastest implementation is as fast as the widely used SRILM while requiring only 25% of the storage. Our most compact representation can store all 4 billion n-grams and associated counts for the Google n-gram corpus in 23 bits per n-gram, the most compact lossless representation to date, and even more compact than recent lossy compression techniques. We also discuss techniques for improving query speed during decoding, including a simple but novel language model caching technique that improves the query speed of our language models (and SRILM) by up to 300%.