Perspectives on crowdsourcing annotations for natural language processing

Authors:
Aobo Wang;Cong Duy Hoang;Min-Yen Kan
Affiliations:
AS6 04-13 Computing 1, 13 Computing Drive National University of Singapore, Singapore, Singapore 117417;Human Language Technology Department, Institute for Infocomm Research (I²R), A*STAR, Singapore, Singapore 138632;AS6 05-12 Computing 1, 13 Computing Drive National University of Singapore, Singapore, Singapore 117417
Venue:
Language Resources and Evaluation
Year:
2013

Citing 40
Cited 3

The Berkeley FrameNet Project

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Labeling images with a computer game

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Trust in recommender systems

Proceedings of the 10th international conference on Intelligent user interfaces
Collecting paraphrase corpora from volunteer contributors

Proceedings of the 3rd international conference on Knowledge capture
Games with a Purpose

Computer
Human computation

Human computation
Trust-aware recommender systems

Proceedings of the 2007 ACM conference on Recommender systems
OntoNotes: A Unified Relational Semantic Representation

ICSC '07 Proceedings of the International Conference on Semantic Computing
Crowdsourcing user studies with Mechanical Turk

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Designing games with a purpose

Communications of the ACM - Designing games with a purpose
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Semantic Turkey: A Semantic Bookmarking Tool (System Description)

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
KissKissBan: a competitive human computation game for image annotation

Proceedings of the ACM SIGKDD Workshop on Human Computation
Community-based game design: experiments on social games for commonsense data collection

Proceedings of the ACM SIGKDD Workshop on Human Computation
The role of game theory in human computation systems

Proceedings of the ACM SIGKDD Workshop on Human Computation
Financial incentives and the "performance of crowds"

Proceedings of the ACM SIGKDD Workshop on Human Computation
txteagle: Mobile Crowdsourcing

IDGD '09 Proceedings of the 3rd International Conference on Internationalization, Design and Global Development: Held as Part of HCI International 2009
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Online word games for semantic data collection

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A Survey of Human Computation Systems

CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 04
The singularity is not near: slowing growth of Wikipedia

Proceedings of the 5th International Symposium on Wikis and Open Collaboration
Crowdsourcing, attention and productivity

Journal of Information Science
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Acquiring high quality non-expert knowledge from on-demand workforce

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Constructing an anaphorically annotated corpus with non-experts: assessing the quality of collaborative annotations

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
OntoGame: weaving the semantic web by online games

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Shared task: crowdsourced accessibility elicitation of Wikipedia articles

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Document image collection using Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Can crowds build parallel corpora for machine translation systems?

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Annotating large email datasets for named entity recognition with Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Using Mechanical Turk to annotate lexicons for less commonly used languages

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Opinion mining of Spanish customer comments with non-expert annotations on Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Not-so-latent dirichlet allocation: collapsed Gibbs sampling using human judgments

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
The wisdom of the crowd's ear: speech accent rating and annotation with Amazon Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Measuring transitivity using untrained annotators

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Amazon Mechanical Turk for subjectivity word sense disambiguation

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Using Mechanical Turk to build machine translation evaluation sets

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Report on the second NLG challenge on generating instructions in virtual environments (GIVE-2)

INLG '10 Proceedings of the 6th International Natural Language Generation Conference

Crowdsourcing inference-rule evaluation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Games with a Purpose or Mechanised Labour?: A Comparative Study

Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies
Crowdsourced Knowledge Acquisition: Towards Hybrid-Genre Workflows

International Journal on Semantic Web & Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Crowdsourcing has emerged as a new method for obtaining annotations for training models for machine learning. While many variants of this process exist, they largely differ in their methods of motivating subjects to contribute and the scale of their applications. To date, there has yet to be a study that helps the practitioner to decide what form an annotation application should take to best reach its objectives within the constraints of a project. To fill this gap, we provide a faceted analysis of crowdsourcing from a practitioner's perspective, and show how our facets apply to existing published crowdsourced annotation applications. We then summarize how the major crowdsourcing genres fill different parts of this multi-dimensional space, which leads to our recommendations on the potential opportunities crowdsourcing offers to future annotation efforts.