Measuring the influence of long range dependencies with neural network language models

  • Authors:
  • Le Hai Son;Alexandre Allauzen;François Yvon

  • Affiliations:
  • Univ. Paris-Sud and LIMSI/CNRS, Orsay cedex, France;Univ. Paris-Sud and LIMSI/CNRS, Orsay cedex, France;Univ. Paris-Sud and LIMSI/CNRS, Orsay cedex, France

  • Venue:
  • WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In spite of their well known limitations, most notably their use of very local contexts, n-gram language models remain an essential component of many Natural Language Processing applications, such as Automatic Speech Recognition or Statistical Machine Translation. This paper investigates the potential of language models using larger context windows comprising up to the 9 previous words. This study is made possible by the development of several novel Neural Network Language Model architectures, which can easily fare with such large context windows. We experimentally observed that extending the context size yields clear gains in terms of perplexity and that the n-gram assumption is statistically reasonable as long as n is sufficiently high, and that efforts should be focused on improving the estimation procedures for such large models.