Unsupervised detection of annotation inconsistencies using Apriori algorithm

  • Authors:
  • Václav Novák;Magda Razímová

  • Affiliations:
  • Charles University in Prague, Czech Republic;Charles University in Prague, Czech Republic

  • Venue:
  • ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a new method for automated discovery of inconsistencies in a complex manually annotated corpora. The proposed technique is based on Apriori algorithm for mining association rules from datasets. By setting appropriate parameters to the algorithm, we were able to automatically infer highly reliable rules of annotation and subsequently we searched for records for which the inferred rules were violated. We show that the violations found by this simple technique are often caused by an annotation error. We present an evaluation of this technique on a hand-annotated corpus PDT 2.0, present the error analysis and show that in the first 100 detected nodes 20 of them contained an annotation error.