Communications of the ACM
Achieving an Almost Correct PoS-Tagged Corpus
TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Improving accuracy in word class tagging through the combination of machine learning systems
Computational Linguistics
An annotation scheme for free word order languages
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
On the evaluation and comparison of taggers: the effect of noise in testing corpora
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Detecting errors in part-of-speech annotation
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Self-organizing η-gram model for automatic word spacing
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Flexible text segmentation with structured multilabel classification
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Correcting dependency annotation errors
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Consistency checking for Treebank alignment
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Exploring the data-driven prediction of prepositions in English
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Hi-index | 0.00 |
Consistency of corpus annotation is an essential property for the many uses of annotated corpora in computational and theoretical linguistics. While some research addresses the detection of inconsistencies in positional annotation (e.g., part-of-speech) and continuous structural annotation (e.g., syntactic constituency), no approach has yet been developed for automatically detecting annotation errors in discontinuous structural annotation. This is significant since the annotation of potentially discontinuous stretches of material is increasingly relevant, from tree-banks for free-word order languages to semantic and discourse annotation.In this paper we discuss how the variation n-gram error detection approach (Dickinson and Meurers, 2003a) can be extended to discontinuous structural annotation. We exemplify the approach by showing how it successfully detects errors in the syntactic annotation of the German TIGER corpus (Brants et al., 2002).