Crowdsourcing and language studies: the new generation of linguistic data

Authors:
Robert Munro;Steven Bethard;Victor Kuperman;Vicky Tzuyin Lai;Robin Melnick;Christopher Potts;Tyler Schnoebelen;Harry Tily
Affiliations:
Stanford University;Stanford University;Stanford University;University of Colorado;Stanford University;Stanford University;Stanford University;Stanford University
Venue:
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Year:
2010

Citing 4
Cited 7

Data quality from crowdsourcing: a study of annotation selection criteria

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Language and Equilibrium

Language and Equilibrium

"Was it good? It was provocative." Learning the meaning of scalar adjectives

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Using query patterns to learn the duration of events

IWCS '11 Proceedings of the Ninth International Conference on Computational Semantics
The importance of visual context clues in multimedia translation

CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
CDAS: a crowdsourcing data analytics system

Proceedings of the VLDB Endowment
Crowdsourcing research opportunities: lessons from natural language processing

Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
Crowdsourcing and the crisis-affected community

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a compendium of recent and current projects that utilize crowdsourcing technologies for language studies, finding that the quality is comparable to controlled laboratory experiments, and in some cases superior. While crowdsourcing has primarily been used for annotation in recent language studies, the results here demonstrate that far richer data may be generated in a range of linguistic disciplines from semantics to psycholinguistics. For these, we report a number of successful methods for evaluating data quality in the absence of a 'correct' response for any given data point.