Language models for machine translation: original vs. translated texts
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Language models for machine translation: Original vs. translated texts
Computational Linguistics
Hi-index | 0.00 |
Language models are used in a wide variety of natural language applications, including machine translation, speech recognition, spelling correction, optical character recognition, etc. Recent studies have shown that more data is better data, and bigger language models are better language models: the authors found nearly constant machine translation improvements with each doubling of the training data size even at 2 trillion tokens (resulting in 400 billion n-grams). Training and using such large models is a challenge. This tutorial shows efficient methods for distributed training of large language models based on the MapReduce computing model. We also show efficient ways of using distributed models in which requesting individual n-grams is expensive because they require communication between different machines.