Implementing crowdsourcing-based relevance experimentation: an industrial perspective

Authors:
Omar Alonso
Affiliations:
Microsoft Corp., Mountain View, USA
Venue:
Information Retrieval
Year:
2013

Citing 30
Cited 2

When Computers Were Human

When Computers Were Human
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Crowdsourcing user studies with Mechanical Turk

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
How evaluator domain expertise affects search result relevance judgments

Proceedings of the 17th ACM conference on Information and knowledge management
Inter-coder agreement for computational linguistics

Computational Linguistics
Towards methods for the collective gathering and quality control of relevance assessments

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Financial incentives and the "performance of crowds"

Proceedings of the ACM SIGKDD Workshop on Human Computation
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
The effect of assessor error on IR system evaluation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Task search in a human computation market

Proceedings of the ACM SIGKDD Workshop on Human Computation
Quality management on Amazon Mechanical Turk

Proceedings of the ACM SIGKDD Workshop on Human Computation
Bucking the trend: large-scale cost-focused active learning for statistical machine translation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Soylent: a word processor with a crowd inside

UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
Crowdsourcing document relevance assessment with Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Managing crowdsourced human computation: a tutorial

Proceedings of the 20th international conference companion on World wide web
Design and implementation of relevance assessments using crowdsourcing

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
In search of quality in crowdsourcing for search engine evaluation

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
The ownership and reuse of visual media

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Crowdsourcing translation: professional quality from non-professionals

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Repeatable and reliable search system evaluation using crowdsourcing

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
The effects of choice in routing relevance judgments

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Measuring assessor accuracy: a comparison of nist assessors and user study participants

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Crowdsourcing for information retrieval: principles, methods, and applications

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Information Retrieval Evaluation

Information Retrieval Evaluation
Human Computation

Human Computation
Evaluation and user preference study on spatial diversity

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Active learning with Amazon Mechanical Turk

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using crowdsourcing for TREC relevance assessment

Information Processing and Management: an International Journal

Crowdsourcing for information retrieval: introduction to the special issue

Information Retrieval
Experiences surveying the crowd: reflections on methods, participation, and reliability

Proceedings of the 5th Annual ACM Web Science Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Crowdsourcing has emerged as a viable platform for conducting different types of relevance evaluation. The main reason behind this trend is that it makes possible to conduct experiments extremely fast, with good results at a low cost. However, like in any experiment, there are several implementation details that would make an experiment work or fail. To gather useful results, clear instructions, user interface guidelines, content quality, inter-rater agreement metrics, work quality, and worker feedback are important characteristics of a successful crowdsourcing experiment. Furthermore, designing and implementing experiments that require thousands or millions of labels is different than conducting small scale research investigations. In this paper we outline a framework for conducting continuous crowdsourcing experiments, emphasizing aspects that should be of importance for all sorts of tasks. We illustrate the value of characteristics that can impact the overall outcome using examples based on TREC, INEX, and Wikipedia data sets.