Training continuous space language models: some practical issues

  • Authors:
  • Le Hai Son;Alexandre Allauzen;Guillaume Wisniewski;François Yvon

  • Affiliations:
  • Univ. Paris-Sud, France and LIMSI/CNRS, Orsay Cedex;Univ. Paris-Sud, France and LIMSI/CNRS, Orsay Cedex;Univ. Paris-Sud, France and LIMSI/CNRS, Orsay Cedex;Univ. Paris-Sud, France and LIMSI/CNRS, Orsay Cedex

  • Venue:
  • EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Using multi-layer neural networks to estimate the probabilities of word sequences is a promising research area in statistical language modeling, with applications in speech recognition and statistical machine translation. However, training such models for large vocabulary tasks is computationally challenging which does not scale easily to the huge corpora that are nowadays available. In this work, we study the performance and behavior of two neural statistical language models so as to highlight some important caveats of the classical training algorithms. The induced word embeddings for extreme cases are also analysed, thus providing insight into the convergence issues. A new initialization scheme and new training techniques are then introduced. These methods are shown to greatly reduce the training time and to significantly improve performance, both in terms of perplexity and on a large-scale translation task.