Interpolation search—a log logN search
Communications of the ACM
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Tightly packed tries: how to fit large models into memory, and make them load fast, too
SETQA-NLP '09 Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing
Joshua: an open source toolkit for parsing-based machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
How many bits are needed to store probabilities for phrase-based translation?
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
ACLDemos '10 Proceedings of the ACL 2010 System Demonstrations
Storing the web in memory: space efficient language models with constant time retrieval
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Faster and smaller N-gram language models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
CMU system combination in WMT 2011
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Improving translation model by monolingual data
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
The CMU-ARK German-English translation system
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Factored translation with unsupervised word clusters
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Analyzing parallelism and domain similarities in the MAREC patent corpus
IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Structural and topical dimensions in multi-task patent translation
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
ONTS: "optima" news translation system
EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Large-scale syntactic language modeling with treelets
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Heuristic cube pruning in linear time
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Large, pruned or continuous space language models on a GPU for statistical machine translation
WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
A systematic comparison of phrase table pruning techniques
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Language model rest costs and space-efficient storage
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Document-wide decoding for phrase-based statistical machine translation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Opinum: statistical sentiment analysis for opinion classification
WASSA '12 Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis
The CMU-avenue French-English translation system
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Joshua 4.0: packing, PRO, and paraphrases
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Towards effective use of training data in statistical machine translation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Kriya - The SFU system for translation task at WMT-12
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Analysing the effect of out-of-domain data on SMT systems
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Optimization strategies for online large-margin learning in machine translation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Juggling the Jigsaw: towards automated problem inference from network trouble tickets
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
We present KenLM, a library that implements two data structures for efficient language model queries, reducing both time and memory costs. The Probing data structure uses linear probing hash tables and is designed for speed. Compared with the widely-used SRILM, our Probing model is 2.4 times as fast while using 57% of the memory. The Trie data structure is a trie with bit-level packing, sorted records, interpolation search, and optional quantization aimed at lower memory consumption. Trie simultaneously uses less memory than the smallest lossless baseline and less CPU than the fastest baseline. Our code is open-source, thread-safe, and integrated into the Moses, cdec, and Joshua translation systems. This paper describes the several performance techniques used and presents benchmarks against alternative implementations.