Annotation schemes and their influence on parsing results

Authors:
Wolfgang Maier
Affiliations:
Universität Tübingen, Tübingen, Germany
Venue:
COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Year:
2006

Citing 6
Cited 4

Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
An annotation scheme for free word order languages

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Probabilistic parsing for German using sister-head dependencies

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Penn Treebank: annotating predicate argument structure

HLT '94 Proceedings of the workshop on Human Language Technology

Is it really that difficult to parse German?

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Parsing three German treebanks: lexicalized and unlexicalized baselines

PaGe '08 Proceedings of the Workshop on Parsing German
The PaGe 2008 shared task on parsing German

PaGe '08 Proceedings of the Workshop on Parsing German
An information-theoretic measure to evaluate parsing difficulty across treebanks

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of the work on treebank-based statistical parsing exclusively uses the Wall-Street-Journal part of the Penn treebank for evaluation purposes. Due to the presence of this quasi-standard, the question of to which degree parsing results depend on the properties of treebanks was often ignored. In this paper, we use two similar German treebanks, TüBa-D/Z and NeGra, and investigate the role that different annotation decisions play for parsing. For these purposes, we approximate the two treebanks by gradually taking out or inserting the corresponding annotation components and test the performance of a standard PCFG parser on all treebank versions. Our results give an indication of which structures are favorable for parsing and which ones are not.