Measuring the influence of long range dependencies with neural network language models

Authors:
Le Hai Son;Alexandre Allauzen;François Yvon
Affiliations:
Univ. Paris-Sud and LIMSI/CNRS, Orsay cedex, France;Univ. Paris-Sud and LIMSI/CNRS, Orsay cedex, France;Univ. Paris-Sud and LIMSI/CNRS, Orsay cedex, France
Venue:
WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Year:
2012

Citing 14
Cited 0

A Cache-Based Natural Language Model for Speech Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Class-based n-gram models of natural language

Computational Linguistics
A neural probabilistic language model

The Journal of Machine Learning Research
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Adaptive language modeling using the maximum entropy principle

HLT '93 Proceedings of the workshop on Human Language Technology
Discriminative syntactic language modeling for speech recognition

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A hierarchical Bayesian language model based on Pitman-Yor processes

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Continuous space language models

Computer Speech and Language
A unified architecture for natural language processing: deep neural networks with multitask learning

Proceedings of the 25th international conference on Machine learning
Incremental syntactic language models for phrase-based translation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Natural Language Processing (Almost) from Scratch

The Journal of Machine Learning Research
LIMSI @ WMT11

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In spite of their well known limitations, most notably their use of very local contexts, n-gram language models remain an essential component of many Natural Language Processing applications, such as Automatic Speech Recognition or Statistical Machine Translation. This paper investigates the potential of language models using larger context windows comprising up to the 9 previous words. This study is made possible by the development of several novel Neural Network Language Model architectures, which can easily fare with such large context windows. We experimentally observed that extending the context size yields clear gains in terms of perplexity and that the n-gram assumption is statistically reasonable as long as n is sufficiently high, and that efforts should be focused on improving the estimation procedures for such large models.