A structured language model based on context-sensitive probabilistic left-corner parsing

Authors:
Dong Hoon Van Uytsel;Filip Van Aelten;Dirk Van Compernolle
Affiliations:
Katholieke Universiteit Leuven, ESAT, Belgium;Lernout & Hauspie, Belgium;Katholieke Universiteit Leuven, ESAT, Belgium
Venue:
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Year:
2001

Citing 10
Cited 0

Natural language parsing as statistical pattern recognition

Natural language parsing as statistical pattern recognition
An efficient probabilistic context-free parsing algorithm that computes prefix probabilities

Computational Linguistics
Statistical methods for speech recognition

Statistical methods for speech recognition
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Exploiting syntactic structure for natural language modeling

Exploiting syntactic structure for natural language modeling
Computation of the probability of initial substring generation by stochastic context-free grammars

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Efficient probabilistic top-down and left-corner parsing

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Quantified Score

Hi-index	0.01

Visualization

Abstract

Recent contributions to statistical language modeling for speech recognition have shown that probabilistically parsing a partial word sequence aids the prediction of the next word, leading to "structured" language models that have the potential to outperform n-grams. Existing approaches to structured language modeling construct nodes in the partial parse tree after all of the underlying words have been predicted. This paper presents a different approach, based on probabilistic left-corner grammar (PLCG) parsing, that extends a partial parse both from the bottom up and from the top down, leading to a more focused and more accurate, though somewhat less robust, search of the parse space. At the core of our new structured language model is a fast context-sensitive and lexicalized PLCG parsing algorithm that uses dynamic programming. Preliminary perplexity and word-accuracy results appear to be competitive with previous ones, while speed is increased.