Web-based models for natural language processing
ACM Transactions on Speech and Language Processing (TSLP)
How many bits are needed to store probabilities for phrase-based translation?
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Tightly packed tries: how to fit large models into memory, and make them load fast, too
SETQA-NLP '09 Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing
Experiments in morphosyntactic processing for translating to and from German
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
A succinct N-gram language model
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
MaTrEx: the DCU MT system for WMT 2010
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Storing the web in memory: space efficient language models with constant time retrieval
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
OpenMaTrEx: a free/open-source marker-driven example-based machine translation system
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Faster and smaller N-gram language models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A dependency based statistical translation model
SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
NAIST at the HOO 2012 shared task
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Large, pruned or continuous space language models on a GPU for statistical machine translation
WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Opinum: statistical sentiment analysis for opinion classification
WASSA '12 Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis
CCG syntactic reordering models for phrase-based machine translation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Bagging and Boosting statistical machine translation systems
Artificial Intelligence
Hi-index | 0.00 |
Statistical machine translation, as well as other areas of human language processing, have recently pushed toward the use of large scale n-gram language models. This paper presents efficient algorithmic and architectural solutions which have been tested within the Moses decoder, an open source toolkit for statistical machine translation. Experiments are reported with a high performing baseline, trained on the Chinese-English NIST 2006 Evaluation task and running on a standard Linux 64-bit PC architecture. Comparative tests show that our representation halves the memory required by SRI LM Toolkit, at the cost of 44% slower translation speed. However, as it can take advantage of memory mapping on disk, the proposed implementation seems to scale-up much better to very large language models: decoding with a 289-million 5-gram language model runs in 2.1Gb of RAM.