Training neural network language models on very large corpora

Authors:
Holger Schwenk;Jean-Luc Gauvain
Affiliations:
LIMSI-CNRS, Orsay cedex, France;LIMSI-CNRS, Orsay cedex, France
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 4
Cited 10

Class-based n-gram models of natural language

Computational Linguistics
The nature of statistical learning theory

The nature of statistical learning theory
Boosting a weak learning algorithm by majority

Information and Computation
A neural probabilistic language model

The Journal of Machine Learning Research

Continuous space language models

Computer Speech and Language
Continuous space language models for statistical machine translation

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Three new graphical models for statistical language modelling

Proceedings of the 24th international conference on Machine learning
Factored neural language models

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Intelligent steganalytic system: application on natural language environment

WSEAS Transactions on Systems and Control
Automatic Speech-to-Text Transcription in Arabic

ACM Transactions on Asian Language Information Processing (TALIP)
Large-scale language modeling with random forests for mandarin Chinese speech-to-text

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Acoustically discriminative language model training with pseudo-hypothesis

Speech Communication
Deep neural network language models

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

During the last years there has been growing interest in using neural networks for language modeling. In contrast to the well known back-off n-gram language models, the neural network approach attempts to overcome the data sparseness problem by performing the estimation in a continuous space. This type of language model was mostly used for tasks for which only a very limited amount of in-domain training data is available.In this paper we present new algorithms to train a neural network language model on very large text corpora. This makes possible the use of the approach in domains where several hundreds of millions words of texts are available. The neural network language model is evaluated in a state-of-the-art real-time continuous speech recognizer for French Broadcast News. Word error reductions of 0.5% absolute are reported using only a very limited amount of additional processing time.