Acquiring high quality non-expert knowledge from on-demand workforce

Authors:
Donghui Feng;Sveva Besana;Remi Zajac
Affiliations:
AT&T Interactive Research, Glendale, CA;AT&T Interactive Research, Glendale, CA;AT&T Interactive Research, Glendale, CA
Venue:
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Year:
2009

Citing 13
Cited 5

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Open Mind Common Sense: Knowledge Acquisition from the General Public

On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002
Learner: a system for acquiring commonsense knowledge by analogy

Proceedings of the 2nd international conference on Knowledge capture
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
The Berkeley FrameNet Project

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Labeling images with a computer game

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The Proposition Bank: An Annotated Corpus of Semantic Roles

Computational Linguistics
Internet-scale collection of human-reviewed data

Proceedings of the 16th international conference on World Wide Web
Disambiguating for the web: a test of two methods

Proceedings of the 4th international conference on Knowledge capture
Crowdsourcing user studies with Mechanical Turk

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
OntoNotes: the 90% solution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation

Proceedings of the international conference on Multimedia information retrieval
Sellers' problems in human computation markets

Proceedings of the ACM SIGKDD Workshop on Human Computation
Consistency in physical and on-screen action improves perceptions of telepresence robots

HRI '12 Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction
Perspectives on crowdsourcing annotations for natural language processing

Language Resources and Evaluation
Phrase detectives: Utilizing collective intelligence for internet-scale language resource creation

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special section on internet-scale human problem solving and regular papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Being expensive and time consuming, human knowledge acquisition has consistently been a major bottleneck for solving real problems. In this paper, we present a practical framework for acquiring high quality non-expert knowledge from on-demand workforce using Amazon Mechanical Turk (MTurk). We show how to apply this framework to collect large-scale human knowledge on AOL query classification in a fast and efficient fashion. Based on extensive experiments and analysis, we demonstrate how to detect low-quality labels from massive data sets and their impact on collecting high-quality knowledge. Our experimental findings also provide insight into the best practices on balancing cost and data quality for using MTurk.