Using crowdsourcing for TREC relevance assessment

Authors:
Omar Alonso;Stefano Mizzaro
Affiliations:
Microsoft Corp., 1065 La Avenida, Mountain View, CA 94043, USA;Dept. of Maths and Computer Science, University of Udine, Via delle Scienze, 206, 33100 Udine, Italy
Venue:
Information Processing and Management: an International Journal
Year:
2012

Citing 27
Cited 4

Efficient construction of large test collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness

Information Processing and Management: an International Journal
Ranking retrieval systems without relevance judgments

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Liberal relevance criteria of TREC -: counting on negligible documents?

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The Philosophy of Information Retrieval Evaluation

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical method for system evaluation using incomplete judgments

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Meeting of the MINDS: an information retrieval research agenda

ACM SIGIR Forum
Crowdsourcing user studies with Mechanical Turk

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Evaluation over thousands of queries

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Relevance assessment: are judges exchangeable and does it matter

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Crowdsourcing for relevance evaluation

ACM SIGIR Forum
Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business

Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business
Towards methods for the collective gathering and quality control of relevance assessments

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Relevance criteria for e-commerce: a crowdsourcing-based experimental analysis

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A few good topics: Experiments in topic set reduction for retrieval evaluation

ACM Transactions on Information Systems (TOIS)
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation

Proceedings of the international conference on Multimedia information retrieval
Here or there: preference judgments for relevance

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
The effect of assessor error on IR system evaluation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Crowdsourcing document relevance assessment with Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Design and implementation of relevance assessments using crowdsourcing

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Measuring assessor accuracy: a comparison of nist assessors and user study participants

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Crowdsourcing for information retrieval: principles, methods, and applications

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Crowdsourcing assessments for XML ranked retrieval

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

Comparing crowd-based, game-based, and machine-based approaches in initial query and query refinement tasks

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Crowdsourcing for information retrieval: introduction to the special issue

Information Retrieval
Implementing crowdsourcing-based relevance experimentation: an industrial perspective

Information Retrieval
Evaluation in Music Information Retrieval

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Crowdsourcing has recently gained a lot of attention as a tool for conducting different kinds of relevance evaluations. At a very high level, crowdsourcing describes outsourcing of tasks to a large group of people instead of assigning such tasks to an in-house employee. This crowdsourcing approach makes possible to conduct information retrieval experiments extremely fast, with good results at a low cost. This paper reports on the first attempts to combine crowdsourcing and TREC: our aim is to validate the use of crowdsourcing for relevance assessment. To this aim, we use the Amazon Mechanical Turk crowdsourcing platform to run experiments on TREC data, evaluate the outcomes, and discuss the results. We make emphasis on the experiment design, execution, and quality control to gather useful results, with particular attention to the issue of agreement among assessors. Our position, supported by the experimental results, is that crowdsourcing is a cheap, quick, and reliable alternative for relevance assessment.