Unsupervised detection of annotation inconsistencies using Apriori algorithm

Authors:
Václav Novák;Magda Razímová
Affiliations:
Charles University in Prague, Czech Republic;Charles University in Prague, Czech Republic
Venue:
ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Year:
2009

Citing 1
Cited 2

Post-annotation checking of prague dependency treebank 2.0 data

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue

Agile corpus annotation in practice: an overview of manual and automatic annotation of CVs

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Reducing the need for double annotation

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new method for automated discovery of inconsistencies in a complex manually annotated corpora. The proposed technique is based on Apriori algorithm for mining association rules from datasets. By setting appropriate parameters to the algorithm, we were able to automatically infer highly reliable rules of annotation and subsequently we searched for records for which the inferred rules were violated. We show that the violations found by this simple technique are often caused by an annotation error. We present an evaluation of this technique on a hand-annotated corpus PDT 2.0, present the error analysis and show that in the first 100 detected nodes 20 of them contained an annotation error.