Fast syntactic analysis for statistical language modeling via substructure sharing and uptraining

Authors:
Ariya Rastrow;Mark Dredze;Sanjeev Khudanpur
Affiliations:
Johns Hopkins University, Baltimore, MD;Johns Hopkins University, Baltimore, MD;Johns Hopkins University, Baltimore, MD
Venue:
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Year:
2012

Citing 11
Cited 2

Best-first word-lattice parsing: techniques for integrated syntactic language modeling

Best-first word-lattice parsing: techniques for integrated syntactic language modeling
Discriminative syntactic language modeling for speech recognition

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Discriminative n-gram language modeling

Computer Speech and Language
A best-first probabilistic shift-reduce parser

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A joint language model with fine-grain syntactic tags

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
An efficient algorithm for easy-first non-directional dependency parsing

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Dynamic programming for linear-time incremental parsing

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Self-training with products of latent variable grammars

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Uptraining for accurate deterministic question parsing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Learning with lookahead: can history-based models rival globally optimized models?

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Advances in speech transcription at IBM under the DARPA EARS program

IEEE Transactions on Audio, Speech, and Language Processing

Automatic speech recognition for under-resourced languages: A survey

Speech Communication
Large vocabulary Russian speech recognition using syntactico-statistical language modeling

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Long-span features, such as syntax, can improve language models for tasks such as speech recognition and machine translation. However, these language models can be difficult to use in practice because of the time required to generate features for rescoring a large hypothesis set. In this work, we propose substructure sharing, which saves duplicate work in processing hypothesis sets with redundant hypothesis structures. We apply substructure sharing to a dependency parser and part of speech tagger to obtain significant speedups, and further improve the accuracy of these tools through up-training. When using these improved tools in a language model for speech recognition, we obtain significant speed improvements with both N-best and hill climbing rescoring, and show that up-training leads to WER reduction.