Worker types and personality traits in crowdsourcing relevance labels

Authors:
Gabriella Kazai;Jaap Kamps;Natasa Milic-Frayling
Affiliations:
Microsoft Research, Cambridge, United Kingdom;University of Amsterdam, Amsterdam, Netherlands;Microsoft Research, Cambridge, United Kingdom
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 13
Cited 3

Crowdsourcing user studies with Mechanical Turk

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Crowdsourcing for relevance evaluation

ACM SIGIR Forum
Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business

Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Are your participants gaming the system?: screening mechanical turk workers

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Who are the crowdworkers?: shifting demographics in mechanical turk

CHI '10 Extended Abstracts on Human Factors in Computing Systems
The effect of assessor error on IR system evaluation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Crowdsourcing document relevance assessment with Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Analyzing the Amazon Mechanical Turk marketplace

XRDS: Crossroads, The ACM Magazine for Students - Comp-YOU-Ter
Design and implementation of relevance assessments using crowdsourcing

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
In search of quality in crowdsourcing for search engine evaluation

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Overview of the INEX 2010 book track: scaling up the evaluation using crowdsourcing

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval

Crowdsourcing for information retrieval: introduction to the special issue

Information Retrieval
Increasing cheat robustness of crowdsourcing tasks

Information Retrieval
An analysis of human factors and label accuracy in crowdsourcing relevance judgments

Information Retrieval

Quantified Score

Hi-index	0.01

Visualization

Abstract

Crowdsourcing platforms offer unprecedented opportunities for creating evaluation benchmarks, but suffer from varied output quality from crowd workers who possess different levels of competence and aspiration. This raises new challenges for quality control and requires an in-depth understanding of how workers' characteristics relate to the quality of their work. In this paper, we use behavioral observations (HIT completion time, fraction of useful labels, label accuracy) to define five worker types: Spammer, Sloppy, Incompetent, Competent, Diligent. Using data collected from workers engaged in the crowdsourced evaluation of the INEX 2010 Book Track Prove It task, we relate the worker types to label accuracy and personality trait information along the `Big Five' personality dimensions. We expect that these new insights about the types of crowd workers and the quality of their work will inform how to design HITs to attract the best workers to a task and explain why certain HIT designs are more effective than others.