Revisiting the case for explicit syntactic information in language models

Authors:
Ariya Rastrow;Sanjeev Khudanpur;Mark Dredze
Affiliations:
Johns Hopkins University Baltimore, MD;Johns Hopkins University Baltimore, MD;Johns Hopkins University Baltimore, MD
Venue:
WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Year:
2012

Citing 8
Cited 0

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A Neural Syntactic Language Model

Machine Learning
Discriminative syntactic language modeling for speech recognition

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Self-training PCFG grammars with latent annotations across languages

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
A joint language model with fine-grain syntactic tags

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Learning with lookahead: can history-based models rival globally optimized models?

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
A Maximum Likelihood Approach to Continuous Speech Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Advances in speech transcription at IBM under the DARPA EARS program

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical language models used in deployed systems for speech recognition, machine translation and other human language technologies are almost exclusively n-gram models. They are regarded as linguistically naïve, but estimating them from any amount of text, large or small, is straightforward. Furthermore, they have doggedly matched or outperformed numerous competing proposals for syntactically well-motivated models. This unusual resilience of n-grams, as well as their weaknesses, are examined here. It is demonstrated that n-grams are good word-predictors, even linguistically speaking, in a large majority of word-positions, and it is suggested that to improve over n-grams, one must explore syntax-aware (or other) language models that focus on positions where n-grams are weak.