Large, pruned or continuous space language models on a GPU for statistical machine translation

Authors:
Holger Schwenk;Anthony Rousseau;Mohammed Attik
Affiliations:
University of Le Mans, France;University of Le Mans, France;University of Le Mans, France
Venue:
WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Year:
2012

Citing 9
Cited 0

A neural probabilistic language model

The Journal of Machine Learning Research
Continuous space language models

Computer Speech and Language
Continuous space language models for statistical machine translation

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Efficient handling of N-gram language models for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Intelligent selection of language model training data

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Training continuous space language models: some practical issues

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
KenLM: faster and smaller language model queries

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Efficient subsampling for training complex language models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Language models play an important role in large vocabulary speech recognition and statistical machine translation systems. The dominant approach since several decades are back-off language models. Some years ago, there was a clear tendency to build huge language models trained on hundreds of billions of words. Lately, this tendency has changed and recent works concentrate on data selection. Continuous space methods are a very competitive approach, but they have a high computational complexity and are not yet in widespread use. This paper presents an experimental comparison of all these approaches on a large statistical machine translation task. We also describe an open-source implementation to train and use continuous space language models (CSLM) for such large tasks. We describe an efficient implementation of the CSLM using graphical processing units from Nvidia. By these means, we are able to train an CSLM on more than 500 million words in 20 hours. This CSLM provides an improvement of up to 1.8 BLEU points with respect to the best back-off language model that we were able to build.