Post-annotation checking of prague dependency treebank 2.0 data
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Agile corpus annotation in practice: an overview of manual and automatic annotation of CVs
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Reducing the need for double annotation
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Hi-index | 0.00 |
We present a new method for automated discovery of inconsistencies in a complex manually annotated corpora. The proposed technique is based on Apriori algorithm for mining association rules from datasets. By setting appropriate parameters to the algorithm, we were able to automatically infer highly reliable rules of annotation and subsequently we searched for records for which the inferred rules were violated. We show that the violations found by this simple technique are often caused by an annotation error. We present an evaluation of this technique on a hand-annotated corpus PDT 2.0, present the error analysis and show that in the first 100 detected nodes 20 of them contained an annotation error.