From annotator agreement to noise models

  • Authors:
  • Beata Beigman Klebanov;Eyal Beigman

  • Affiliations:
  • -;-

  • Venue:
  • Computational Linguistics
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

This article discusses the transition from annotated data to a gold standard, that is, a subset that is sufficiently noise-free with high confidence. Unless appropriately reinterpreted, agreement coefficients do not indicate the quality of the data set as a benchmarking resource: High overall agreement is neither sufficient nor necessary to distill some amount of highly reliable data from the annotated material. A mathematical framework is developed that allows estimation of the noise level of the agreed subset of annotated data, which helps promote cautious benchmarking.