Fast LR parsing using rich (Tree Adjoining) Grammars
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Coping with problems in grammars automatically extracted from Treebanks
COLING-GEE '02 Proceedings of the 2002 workshop on Grammar engineering and evaluation - Volume 15
An efficient LR parser generator for tree-adjoining grammars
New developments in parsing technology
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
This thesis develops the formal aspects of LR parsing for Tree Adjoining Grammars (TAGS) and investigates its application to natural language best-parse parsing. It is argued that the combination of TAG and the LR technique is beneficial to both. On the one hand, starting from the observation that the rich descriptions provided by Tree Adjoining Grammars are better suited to the description of natural languages than Context Free Grammars, we look for an efficient parsing algorithm that yields a single analysis per sentence. On the other direction, the LR technique is argued to benefit from the rich descriptive power of the Tree Adjoining Grammars, leading to better parsing accuracy than when applied to Context Free Grammars. conception of LR parsing. We then propose practical efficient algorithms for LR table generation for TAGs and for a TAG subclass: the Tree Insertion Grammars (TIGs). Decision procedures are presented for conflict resolution during parsing so as to allow the pursuit of the single best-parse for any given sentence. Such procedures are statistical in nature. We obtain their statistical parameters from a large syntactically annotated corpus, the Penn Treebank, from which also the corresponding grammars are extracted. Thanks to the rich descriptions provided by the elementary trees of the Tree Adjoining Grammars, we were able to use a deterministic parsing strategy with limited backtracking (instead of exhaustive parallel all-paths search), thus obtaining a very fast parser with good accuracy. At the end we propose more elaborate techniques for further improving the parsing accuracy of our LR parser for TAG.**This dissertation is a compound document (contains both a paper copy and a CD as part of the dissertation).