Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
A Neural Syntactic Language Model
Machine Learning
Discriminative syntactic language modeling for speech recognition
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Self-training PCFG grammars with latent annotations across languages
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
A joint language model with fine-grain syntactic tags
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Learning with lookahead: can history-based models rival globally optimized models?
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
A Maximum Likelihood Approach to Continuous Speech Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Advances in speech transcription at IBM under the DARPA EARS program
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
Statistical language models used in deployed systems for speech recognition, machine translation and other human language technologies are almost exclusively n-gram models. They are regarded as linguistically naïve, but estimating them from any amount of text, large or small, is straightforward. Furthermore, they have doggedly matched or outperformed numerous competing proposals for syntactically well-motivated models. This unusual resilience of n-grams, as well as their weaknesses, are examined here. It is demonstrated that n-grams are good word-predictors, even linguistically speaking, in a large majority of word-positions, and it is suggested that to improve over n-grams, one must explore syntax-aware (or other) language models that focus on positions where n-grams are weak.