Using large-scale parser output to guide grammar development

Authors:
Ascander Dost;Tracy Holloway King
Affiliations:
Powerset, a Microsoft company;Powerset, a Microsoft company
Venue:
GEAF '09 Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks
Year:
2009

Citing 7
Cited 0

TSNLP: Test Suites for Natural Language Processing

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Parsing the wall street journal using a Lexical-Functional Grammar and discriminative estimation techniques

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
The Penn Treebank: annotating predicate argument structure

HLT '94 Proceedings of the workshop on Human Language Technology
The Parallel Grammar project

COLING-GEE '02 Proceedings of the 2002 workshop on Grammar engineering and evaluation - Volume 15
Error mining for wide-coverage grammar engineering

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Distributed parse mining

SETQA-NLP '09 Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing
Mining syntactically annotated corpora with XQuery

LAW '07 Proceedings of the Linguistic Annotation Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports on guiding parser development by extracting information from output of a large-scale parser applied to Wikipedia documents. Data-driven parser improvement is especially important for applications where the corpus may differ from that originally used to develop the core grammar and where efficiency concerns affect whether a new construction should be added, or existing analyses modified. The large size of the corpus in question also brings scalability concerns to the foreground.