Collecting image annotations using Amazon's Mechanical Turk

Authors:
Cyrus Rashtchian;Peter Young;Micah Hodosh;Julia Hockenmaier
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Year:
2010

Citing 7
Cited 15

Crowdsourcing user studies with Mechanical Turk

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
TurKit: tools for iterative tasks on mechanical Turk

Proceedings of the ACM SIGKDD Workshop on Human Computation
The effect of linguistic devices in information presentation messages on comprehension and recall

ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
The lie detector: explorations in the automatic recognition of deceptive language

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Feasibility of human-in-the-loop minimum error rate training

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1

Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Cross-caption coreference resolution for automatic image understanding

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Every picture tells a story: generating sentences from images

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Collecting highly parallel data for paraphrase evaluation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
CDAS: a crowdsourcing data analytics system

Proceedings of the VLDB Endowment
Midge: generating image descriptions from computer vision detections

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Detecting visual text

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Natural language descriptions of visual scenes: corpus generation and analysis

EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Crowdsourcing micro-level multimedia annotations: the challenges of evaluation and interface

Proceedings of the ACM multimedia 2012 workshop on Crowdsourcing for multimedia
From image annotation to image description

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part V
Leveraging the crowd to improve feature-sentiment analysis of user reviews

Proceedings of the 2013 international conference on Intelligent user interfaces
An introduction to crowdsourcing for language and multimedia technology research

PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Assessing internet video quality using crowdsourcing

Proceedings of the 2nd ACM international workshop on Crowdsourcing for multimedia
Toward crowdsourcing micro-level behavior annotations: the challenges of interface, training, and generalization

Proceedings of the 19th international conference on Intelligent User Interfaces
Framing image description as a ranking task: data, models and evaluation metrics

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Crowd-sourcing approaches such as Amazon's Mechanical Turk (MTurk) make it possible to annotate or collect large amounts of linguistic data at a relatively low cost and high speed. However, MTurk offers only limited control over who is allowed to particpate in a particular task. This is particularly problematic for tasks requiring free-form text entry. Unlike multiple-choice tasks there is no correct answer, and therefore control items for which the correct answer is known cannot be used. Furthermore, MTurk has no effective built-in mechanism to guarantee workers are proficient English writers. We describe our experience in creating corpora of images annotated with multiple one-sentence descriptions on MTurk and explore the effectiveness of different quality control strategies for collecting linguistic data using Mechanical MTurk. We find that the use of a qualification test provides the highest improvement of quality, whereas refining the annotations through follow-up tasks works rather poorly. Using our best setup, we construct two image corpora, totaling more than 40,000 descriptive captions for 9000 images.