Detecting errors in discontinuous structural annotation

Authors:
Markus Dickinson;W. Detmar Meurers
Affiliations:
The Ohio State University;The Ohio State University
Venue:
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Year:
2005

Citing 7
Cited 5

Trie memory

Communications of the ACM
Achieving an Almost Correct PoS-Tagged Corpus

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Improving accuracy in word class tagging through the combination of machine learning systems

Computational Linguistics
An annotation scheme for free word order languages

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
On the evaluation and comparison of taggers: the effect of noise in testing corpora

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Detecting errors in part-of-speech annotation

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1

Self-organizing η-gram model for automatic word spacing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Flexible text segmentation with structured multilabel classification

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Correcting dependency annotation errors

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Consistency checking for Treebank alignment

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Exploring the data-driven prediction of prepositions in English

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Consistency of corpus annotation is an essential property for the many uses of annotated corpora in computational and theoretical linguistics. While some research addresses the detection of inconsistencies in positional annotation (e.g., part-of-speech) and continuous structural annotation (e.g., syntactic constituency), no approach has yet been developed for automatically detecting annotation errors in discontinuous structural annotation. This is significant since the annotation of potentially discontinuous stretches of material is increasingly relevant, from tree-banks for free-word order languages to semantic and discourse annotation.In this paper we discuss how the variation n-gram error detection approach (Dickinson and Meurers, 2003a) can be extended to discontinuous structural annotation. We exemplify the approach by showing how it successfully detects errors in the syntactic annotation of the German TIGER corpus (Brants et al., 2002).