Crowdsourcing using Mechanical Turk: quality management and scalability

  • Authors:
  • Panos Ipeirotis

  • Affiliations:
  • New York University

  • Venue:
  • Proceedings of the 8th International Workshop on Information Integration on the Web: in conjunction with WWW 2011
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

I will discuss the repeated acquisition of "labels" for data items when the labeling is imperfect. Labels are values provided by humans for specified variables on data items, such as "PG-13" for "Adult Content Rating on this Web Page." With the increasing popularity of micro-outsourcing systems, such as Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction.