Improving a statistical language model through non-linear prediction

Authors:
Andriy Mnih;Zhang Yuecheng;Geoffrey Hinton
Affiliations:
University of Toronto, Department of Computer Science, Toronto, Ontario, Canada;University of Toronto, Department of Computer Science, Toronto, Ontario, Canada;University of Toronto, Department of Computer Science, Toronto, Ontario, Canada
Venue:
Neurocomputing
Year:
2009

Citing 4
Cited 1

A neural probabilistic language model

The Journal of Machine Learning Research
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A Neural Syntactic Language Model

Machine Learning
Three new graphical models for statistical language modelling

Proceedings of the 24th international conference on Machine learning

The sequence memoizer

Communications of the ACM

Quantified Score

Hi-index	0.03

Visualization

Abstract

We show how to improve a state-of-the-art neural network language model that converts the previous ''context'' words into feature vectors and combines these feature vectors linearly to predict the feature vector of the next word. Significant improvements in predictive accuracy are achieved by using a non-linear subnetwork to modulate the effects of the context words or to produce a non-linear correction term when predicting the feature vector. A log-bilinear language model that incorporates both of these improvements achieves a 26% reduction in perplexity over the best n-gram model on a fairly large dataset.