Learning with annotation noise

  • Authors:
  • Eyal Beigman;Beata Beigman Klebanov

  • Affiliations:
  • Washington University in St. Louis;Northwestern University

  • Venue:
  • ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is usually assumed that the kind of noise existing in annotated data is random classification noise. Yet there is evidence that differences between annotators are not always random attention slips but could result from different biases towards the classification categories, at least for the harder-to-decide cases. Under an annotation generation model that takes this into account, there is a hazard that some of the training instances are actually hard cases with unreliable annotations. We show that these are relatively unproblematic for an algorithm operating under the 0--1 loss model, whereas for the commonly used voted perceptron algorithm, hard training cases could result in incorrect prediction on the uncontroversial cases at test time.