Computer
What drives content tagging: the case of photos on Flickr
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
How evaluator domain expertise affects search result relevance judgments
Proceedings of the 17th ACM conference on Information and knowledge management
Towards methods for the collective gathering and quality control of relevance assessments
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Financial incentives and the "performance of crowds"
Proceedings of the ACM SIGKDD Workshop on Human Computation
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Crowdsourcing document relevance assessment with Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Evaluation and user preference study on spatial diversity
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Crowdsourcing assessments for XML ranked retrieval
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Searching microblogs: coping with sparsity and document quality
Proceedings of the 20th ACM international conference on Information and knowledge management
Worker types and personality traits in crowdsourcing relevance labels
Proceedings of the 20th ACM international conference on Information and knowledge management
Proceedings of the 21st international conference on World Wide Web
Content-based retrieval for heterogeneous domains: domain adaptation by relative aggregation points
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Personalized diversification of search results
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Quality through flow and immersion: gamifying crowdsourced relevance assessments
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using crowdsourcing for TREC relevance assessment
Information Processing and Management: an International Journal
An analysis of systematic judging errors in information retrieval
Proceedings of the 21st ACM international conference on Information and knowledge management
Social book search: comparing topical relevance judgements and book suggestions for evaluation
Proceedings of the 21st ACM international conference on Information and knowledge management
WNavis: Navigating Wikipedia semantically with an SNA-based summarization technique
Decision Support Systems
Personalizing atypical web search sessions
Proceedings of the sixth ACM international conference on Web search and data mining
An evaluation of labelling-game data for video retrieval
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
An analysis of human factors and label accuracy in crowdsourcing relevance judgments
Information Retrieval
Identifying top news using crowdsourcing
Information Retrieval
Implementing crowdsourcing-based relevance experimentation: an industrial perspective
Information Retrieval
Pick-a-crowd: tell me what you like, and i'll tell you what to do
Proceedings of the 22nd international conference on World Wide Web
SRbench--a benchmark for soundtrack recommendation systems
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
User intent and assessor disagreement in web search evaluation
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Large-scale linked data integration using probabilistic reasoning and crowdsourcing
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
In the last years crowdsourcing has emerged as a viable platform for conducting relevance assessments. The main reason behind this trend is that makes possible to conduct experiments extremely fast, with good results and at low cost. However, like in any experiment, there are several details that would make an experiment work or fail. To gather useful results, user interface guidelines, inter-agreement metrics, and justification analysis are important aspects of a successful crowdsourcing experiment. In this work we explore the design and execution of relevance judgments using Amazon Mechanical Turk as crowdsourcing platform, introducing a methodology for crowdsourcing relevance assessments and the results of a series of experiments using TREC 8 with a fixed budget. Our findings indicate that workers are as good as TREC experts, even providing detailed feedback for certain query-document pairs. We also explore the importance of document design and presentation when performing relevance assessment tasks. Finally, we show our methodology at work with several examples that are interesting in their own.