Training continuous space language models: some practical issues

Authors:
Le Hai Son;Alexandre Allauzen;Guillaume Wisniewski;François Yvon
Affiliations:
Univ. Paris-Sud, France and LIMSI/CNRS, Orsay Cedex;Univ. Paris-Sud, France and LIMSI/CNRS, Orsay Cedex;Univ. Paris-Sud, France and LIMSI/CNRS, Orsay Cedex;Univ. Paris-Sud, France and LIMSI/CNRS, Orsay Cedex
Venue:
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Year:
2010

Citing 14
Cited 2

Class-based n-gram models of natural language

Computational Linguistics
A neural probabilistic language model

The Journal of Machine Learning Research
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Adaptive language modeling using the maximum entropy principle

HLT '93 Proceedings of the workshop on Human Language Technology
A hierarchical Bayesian language model based on Pitman-Yor processes

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Continuous space language models

Computer Speech and Language
Continuous space language models for statistical machine translation

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Three new graphical models for statistical language modelling

Proceedings of the 24th international conference on Machine learning
A unified architecture for natural language processing: deep neural networks with multitask learning

Proceedings of the 25th international conference on Machine learning
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
LIMSI's statistical translation systems for WMT'09

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Statistical Machine Translation

Statistical Machine Translation

Large, pruned or continuous space language models on a GPU for statistical machine translation

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Neural network language models for off-line handwriting recognition

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Using multi-layer neural networks to estimate the probabilities of word sequences is a promising research area in statistical language modeling, with applications in speech recognition and statistical machine translation. However, training such models for large vocabulary tasks is computationally challenging which does not scale easily to the huge corpora that are nowadays available. In this work, we study the performance and behavior of two neural statistical language models so as to highlight some important caveats of the classical training algorithms. The induced word embeddings for extreme cases are also analysed, thus providing insight into the convergence issues. A new initialization scheme and new training techniques are then introduced. These methods are shown to greatly reduce the training time and to significantly improve performance, both in terms of perplexity and on a large-scale translation task.