A structured language model based on context-sensitive probabilistic left-corner parsing

  • Authors:
  • Dong Hoon Van Uytsel;Filip Van Aelten;Dirk Van Compernolle

  • Affiliations:
  • Katholieke Universiteit Leuven, ESAT, Belgium;Lernout & Hauspie, Belgium;Katholieke Universiteit Leuven, ESAT, Belgium

  • Venue:
  • NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
  • Year:
  • 2001

Quantified Score

Hi-index 0.01

Visualization

Abstract

Recent contributions to statistical language modeling for speech recognition have shown that probabilistically parsing a partial word sequence aids the prediction of the next word, leading to "structured" language models that have the potential to outperform n-grams. Existing approaches to structured language modeling construct nodes in the partial parse tree after all of the underlying words have been predicted. This paper presents a different approach, based on probabilistic left-corner grammar (PLCG) parsing, that extends a partial parse both from the bottom up and from the top down, leading to a more focused and more accurate, though somewhat less robust, search of the parse space. At the core of our new structured language model is a fast context-sensitive and lexicalized PLCG parsing algorithm that uses dynamic programming. Preliminary perplexity and word-accuracy results appear to be competitive with previous ones, while speed is increased.