High-throughput crowdsourcing mechanisms for complex tasks

Authors:
Guido Sautter;Klemens Böhm
Affiliations:
KIT, Karlsruhe, Germany;KIT, Karlsruhe, Germany
Venue:
SocInfo'11 Proceedings of the Third international conference on Social informatics
Year:
2011

Citing 7
Cited 1

Distributed proofreading

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Games with a Purpose

Computer
Creating Digital Resources from Legacy Documents: An Experience Report from the Biosystematics Domain

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
CAPTCHA: using hard AI problems for security

EUROCRYPT'03 Proceedings of the 22nd international conference on Theory and applications of cryptographic techniques
OntoGame: towards overcoming the incentive bottleneck in ontology building

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems - Volume Part II
Crowdsourcing the assembly of concept hierarchies

Proceedings of the 10th annual joint conference on Digital libraries

Crowd IQ: measuring the intelligence of crowdsourcing platforms

Proceedings of the 3rd Annual ACM Web Science Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Crowdsourcing is popular for large-scale data processing endeavors that require human input. However, working with a large community of users raises new challenges. In particular, both possible misjudgment and dishonesty threaten the quality of the results. Common countermeasures are based on redundancy, giving way to a tradeoff between result quality and throughput. Ideally, measures should (1) maintain high throughput and (2) ensure high result quality at the same time. Existing work on crowdsourcing mostly focuses on result quality, paying little attention to throughput or even to that tradeoff. One reason is that the number of tasks (individual atomic units of work) is usually small. A further problem is that the tasks users work on are small as well. In consequence, existing result-improvement mechanisms do not scale to the number or complexity of tasks that arise, for instance, in proofreading and processing of digitized legacy literature. This paper proposes novel resultimprovement mechanisms that (1) are independent of the size and complexity of tasks and (2) allow to trade result quality for throughput to a significant extent. Both mathematical analyses and extensive simulations show the effectiveness of the proposed mechanisms.