A fast re-scoring strategy to capture long-distance dependencies

Authors:
Anoop Deoras;Tomáš Mikolov;Kenneth Church
Affiliations:
Johns Hopkins University, Baltimore MD;Brno University of Technology, Speech@FIT, Czech Republic;Johns Hopkins University, Baltimore MD
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 10
Cited 1

Elements of information theory

Elements of information theory
A design principles of a weighted finite-state transducer library

Theoretical Computer Science - Special issue on implementing automata
Probabilistic top-down parsing and language modeling

Computational Linguistics
The N-Best algorithm: an efficient procedure for finding top N sentence hypotheses

HLT '89 Proceedings of the workshop on Speech and Natural Language
The SuperARV language model: investigating the effectiveness of tightly integrating multiple knowledge sources

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Continuous space language models

Computer Speech and Language
First- and second-order expectation semirings with applications to minimum-risk training on translation forests

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
A joint language model with fine-grain syntactic tags

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
OpenFst: a general and efficient weighted finite-state transducer library

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata

Approximate inference: A sampling based modeling technique to capture complex dependencies in a language model

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

A re-scoring strategy is proposed that makes it feasible to capture more long-distance dependencies in the natural language. Two pass strategies have become popular in a number of recognition tasks such as ASR (automatic speech recognition), MT (machine translation) and OCR (optical character recognition). The first pass typically applies a weak language model (n-grams) to a lattice and the second pass applies a stronger language model to N best lists. The stronger language model is intended to capture more long-distance dependencies. The proposed method uses RNN-LM (recurrent neural network language model), which is a long span LM, to re-score word lattices in the second pass. A hill climbing method (iterative decoding) is proposed to search over islands of confusability in the word lattice. An evaluation based on Broadcast News shows speedups of 20 over basic N best re-scoring, and word error rate reduction of 8% (relative) on a highly competitive setup.