Enhancing reliability using peer consistency evaluation in human computation

Authors:
Shih-Wen Huang;Wai-Tat Fu
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, Illinois, USA;University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
Venue:
Proceedings of the 2013 conference on Computer supported cooperative work
Year:
2013

Citing 18
Cited 3

Labeling images with a computer game

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Designing games with a purpose

Communications of the ACM - Designing games with a purpose
Rethinking the ESP game

CHI '09 Extended Abstracts on Human Factors in Computing Systems
TurKit: tools for iterative tasks on mechanical Turk

Proceedings of the ACM SIGKDD Workshop on Human Computation
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
CAPTCHA: using hard AI problems for security

EUROCRYPT'03 Proceedings of the 22nd international conference on Theory and applications of cryptographic techniques
Financial incentives and the "performance of crowds"

ACM SIGKDD Explorations Newsletter
Word sense disambiguation via human computation

Proceedings of the ACM SIGKDD Workshop on Human Computation
Quality management on Amazon Mechanical Turk

Proceedings of the ACM SIGKDD Workshop on Human Computation
Toward automatic task design: a progress report

Proceedings of the ACM SIGKDD Workshop on Human Computation
Analyzing the Amazon Mechanical Turk marketplace

XRDS: Crossroads, The ACM Magazine for Students - Comp-YOU-Ter
Human computation: a survey and taxonomy of a growing field

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
CrowdForge: crowdsourcing complex work

Proceedings of the 24th annual ACM symposium on User interface software and technology
Human Computation

Human Computation
Cost-Optimal Validation Mechanisms and Cheat-Detection for Crowdsourcing Platforms

IMIS '11 Proceedings of the 2011 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing
Collaboratively crowdsourcing workflows with turkomatic

Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work
Shepherding the crowd yields better work

Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work
CrowdWeaver: visually managing complex crowd work

Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work

Don't hide in the crowd!: increasing social transparency between peer workers improves crowdsourcing outcomes

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Crowd-scale interactive formal reasoning and analytics

Proceedings of the 26th annual ACM symposium on User interface software and technology
Peer and self assessment in massive online classes

ACM Transactions on Computer-Human Interaction (TOCHI)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Peer consistency evaluation is often used in games with a purpose (GWAP) to evaluate workers using outputs of other workers without using gold standard answers. Despite its popularity, the reliability of peer consistency evaluation has never been systematically tested to show how it can be used as a general evaluation method in human computation systems. We present experimental results that show that human computation systems using peer consistency evaluation can lead to outcomes that are even better than those that evaluate workers using gold standard answers. We also show that even without evaluation, simply telling the workers that their answers will be used as future evaluation standards can significantly enhance the workers' performance. Results have important implication for methods that improve the reliability of human computation systems.