Continuous space language models

Authors:
Holger Schwenk
Affiliations:
Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay cedex, France
Venue:
Computer Speech and Language
Year:
2007

Citing 12
Cited 26

A Cache-Based Natural Language Model for Speech Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
A statistical approach to machine translation

Computational Linguistics
Class-based n-gram models of natural language

Computational Linguistics
Bagging predictors

Machine Learning
A maximum entropy approach to natural language processing

Computational Linguistics
The LIMSI Broadcast News transcription system

Speech Communication - Special issue on automatic transcription of broadcast news data
Extracting Distributed Representations of Concepts and Relations from Positive and Negative Propositions

IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 2 - Volume 2
Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Training neural network language models on very large corpora

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Continuous space language models for statistical machine translation

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
New directions in connectionist language modeling

IWANN'03 Proceedings of the Artificial and natural neural networks 7th international conference on Computational methods in neural modeling - Volume 1
Sequential neural text compression

IEEE Transactions on Neural Networks

Speech Processing for Audio Indexing

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Fast Evaluation of Connectionist Language Models

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
Building a statistical machine translation system for French using the Europarl corpus

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
LIMSI's statistical translation systems for WMT'08

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
First steps towards a general purpose French/English statistical machine translation system

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Automatic Speech-to-Text Transcription in Arabic

ACM Transactions on Asian Language Information Processing (TALIP)
Recursive n-gram hashing is pairwise independent, at best

Computer Speech and Language
LIUM SMT machine translation system for WMT 2010

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
UCH-UPV English: Spanish system for WMT10

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Training continuous space language models: some practical issues

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
LIMSI @ WMT11

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
LIUM's SMT machine translation systems for WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
CEU-UPV English-Spanish system for WMT11

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Statistical machine translation with local language models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A fast re-scoring strategy to capture long-distance dependencies

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Efficient subsampling for training complex language models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Use of contexts in language model interpolation and adaptation

Computer Speech and Language
Continuous space translation models with neural networks

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Measuring the influence of long range dependencies with neural network language models

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Large, pruned or continuous space language models on a GPU for statistical machine translation

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Deep neural network language models

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
A challenge set for advancing language modeling

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
LIUM's SMT machine translation systems for WMT 2012

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Language model cross adaptation for LVCSR system combination

Computer Speech and Language
Neural network language models for off-line handwriting recognition

Pattern Recognition
Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the use of a neural network language model for large vocabulary continuous speech recognition. The underlying idea of this approach is to attack the data sparseness problem by performing the language model probability estimation in a continuous space. Highly efficient learning algorithms are described that enable the use of training corpora of several hundred million words. It is also shown that this approach can be incorporated into a large vocabulary continuous speech recognizer using a lattice rescoring framework at a very low additional processing time. The neural network language model was thoroughly evaluated in a state-of-the-art large vocabulary continuous speech recognizer for several international benchmark tasks, in particular the Nist evaluations on broadcast news and conversational speech recognition. The new approach is compared to four-gram back-off language models trained with modified Kneser-Ney smoothing which has often been reported to be the best known smoothing method. Usually the neural network language model is interpolated with the back-off language model. In that way, consistent word error rate reductions for all considered tasks and languages were achieved, ranging from 0.4% to almost 1% absolute.