Lr parsing for tree adjoining grammars and its application to corpus-based natural language parsing

  • Authors:
  • Carlos Augusto Prolo;Aravind K. Joshi

  • Affiliations:
  • -;-

  • Venue:
  • Lr parsing for tree adjoining grammars and its application to corpus-based natural language parsing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This thesis develops the formal aspects of LR parsing for Tree Adjoining Grammars (TAGS) and investigates its application to natural language best-parse parsing. It is argued that the combination of TAG and the LR technique is beneficial to both. On the one hand, starting from the observation that the rich descriptions provided by Tree Adjoining Grammars are better suited to the description of natural languages than Context Free Grammars, we look for an efficient parsing algorithm that yields a single analysis per sentence. On the other direction, the LR technique is argued to benefit from the rich descriptive power of the Tree Adjoining Grammars, leading to better parsing accuracy than when applied to Context Free Grammars. conception of LR parsing. We then propose practical efficient algorithms for LR table generation for TAGs and for a TAG subclass: the Tree Insertion Grammars (TIGs). Decision procedures are presented for conflict resolution during parsing so as to allow the pursuit of the single best-parse for any given sentence. Such procedures are statistical in nature. We obtain their statistical parameters from a large syntactically annotated corpus, the Penn Treebank, from which also the corresponding grammars are extracted. Thanks to the rich descriptions provided by the elementary trees of the Tree Adjoining Grammars, we were able to use a deterministic parsing strategy with limited backtracking (instead of exhaustive parallel all-paths search), thus obtaining a very fast parser with good accuracy. At the end we propose more elaborate techniques for further improving the parsing accuracy of our LR parser for TAG.**This dissertation is a compound document (contains both a paper copy and a CD as part of the dissertation).