Annotating large email datasets for named entity recognition with Mechanical Turk

  • Authors:
  • Nolan Lawson;Kevin Eustice;Mike Perkowitz;Meliha Yetisgen-Yildiz

  • Affiliations:
  • Kiha Software, Seattle, WA;Kiha Software, Seattle, WA;Kiha Software, Seattle, WA;University of Washington, Seattle, WA

  • Venue:
  • CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Amazon's Mechanical Turk service has been successfully applied to many natural language processing tasks. However, the task of named entity recognition presents unique challenges. In a large annotation task involving over 20,000 emails, we demonstrate that a competitive bonus system and inter-annotator agreement can be used to improve the quality of named entity annotations from Mechanical Turk. We also build several statistical named entity recognition models trained with these annotations, which compare favorably to similar models trained on expert annotations.