Approximate inference: A sampling based modeling technique to capture complex dependencies in a language model

Authors:
Anoop Deoras;Tomáš Mikolov;Stefan Kombrink;Kenneth Church
Affiliations:
Microsoft Corporation, 1065 La Avenida. Mountain View, CA 94043, United States;Speech@FIT, Brno University of Technology, Brno, Czech Republic;Speech@FIT, Brno University of Technology, Brno, Czech Republic;IBM T.J. Watson Research Center, Yorktown Heights, NY, United States
Venue:
Speech Communication
Year:
2013

Citing 12
Cited 1

Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Learning in graphical models

Learning in graphical models
Probabilistic top-down parsing and language modeling

Computational Linguistics
Precise n-gram probabilities from stochastic context-free grammars

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Generalized algorithms for constructing statistical language models

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Kullback-Leibler distance between probabilistic context-free grammars and probabilistic finite automata

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A joint language model with fine-grain syntactic tags

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
The 2005 AMI system for the transcription of speech in meetings

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
A fast re-scoring strategy to capture long-distance dependencies

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Search and decoding strategies for complex lexical modeling in lvcsr

Search and decoding strategies for complex lexical modeling in lvcsr

Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present strategies to incorporate long context information directly during the first pass decoding and also for the second pass lattice re-scoring in speech recognition systems. Long-span language models that capture complex syntactic and/or semantic information are seldom used in the first pass of large vocabulary continuous speech recognition systems due to the prohibitive increase in the size of the sentence-hypotheses search space. Typically, n-gram language models are used in the first pass to produce N-best lists, which are then re-scored using long-span models. Such a pipeline produces biased first pass output, resulting in sub-optimal performance during re-scoring. In this paper we show that computationally tractable variational approximations of the long-span and complex language models are a better choice than the standard n-gram model for the first pass decoding and also for lattice re-scoring.