Design and implementation of relevance assessments using crowdsourcing

Authors:
Omar Alonso;Ricardo Baeza-Yates
Affiliations:
Microsoft Corp., Mountain View, California;Yahoo! Research, Barcelona, Spain
Venue:
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Year:
2011

Citing 10
Cited 21

Games with a Purpose

Computer
What drives content tagging: the case of photos on Flickr

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
How evaluator domain expertise affects search result relevance judgments

Proceedings of the 17th ACM conference on Information and knowledge management
Towards methods for the collective gathering and quality control of relevance assessments

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Financial incentives and the "performance of crowds"

Proceedings of the ACM SIGKDD Workshop on Human Computation
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Crowdsourcing document relevance assessment with Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Evaluation and user preference study on spatial diversity

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Crowdsourcing assessments for XML ranked retrieval

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Searching microblogs: coping with sparsity and document quality

Proceedings of the 20th ACM international conference on Information and knowledge management
Worker types and personality traits in crowdsourcing relevance labels

Proceedings of the 20th ACM international conference on Information and knowledge management
ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking

Proceedings of the 21st international conference on World Wide Web
Content-based retrieval for heterogeneous domains: domain adaptation by relative aggregation points

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Personalized diversification of search results

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Quality through flow and immersion: gamifying crowdsourced relevance assessments

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using crowdsourcing for TREC relevance assessment

Information Processing and Management: an International Journal
An analysis of systematic judging errors in information retrieval

Proceedings of the 21st ACM international conference on Information and knowledge management
Social book search: comparing topical relevance judgements and book suggestions for evaluation

Proceedings of the 21st ACM international conference on Information and knowledge management
WNavis: Navigating Wikipedia semantically with an SNA-based summarization technique

Decision Support Systems
Personalizing atypical web search sessions

Proceedings of the sixth ACM international conference on Web search and data mining
An evaluation of labelling-game data for video retrieval

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems

Information Retrieval
An analysis of human factors and label accuracy in crowdsourcing relevance judgments

Information Retrieval
Identifying top news using crowdsourcing

Information Retrieval
Implementing crowdsourcing-based relevance experimentation: an industrial perspective

Information Retrieval
Pick-a-crowd: tell me what you like, and i'll tell you what to do

Proceedings of the 22nd international conference on World Wide Web
SRbench--a benchmark for soundtrack recommendation systems

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
User intent and assessor disagreement in web search evaluation

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Large-scale linked data integration using probabilistic reasoning and crowdsourcing

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last years crowdsourcing has emerged as a viable platform for conducting relevance assessments. The main reason behind this trend is that makes possible to conduct experiments extremely fast, with good results and at low cost. However, like in any experiment, there are several details that would make an experiment work or fail. To gather useful results, user interface guidelines, inter-agreement metrics, and justification analysis are important aspects of a successful crowdsourcing experiment. In this work we explore the design and execution of relevance judgments using Amazon Mechanical Turk as crowdsourcing platform, introducing a methodology for crowdsourcing relevance assessments and the results of a series of experiments using TREC 8 with a fixed budget. Our findings indicate that workers are as good as TREC experts, even providing detailed feedback for certain query-document pairs. We also explore the importance of document design and presentation when performing relevance assessment tasks. Finally, we show our methodology at work with several examples that are interesting in their own.