When Computers Were Human
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Crowdsourcing user studies with Mechanical Turk
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Get another label? improving data quality and data mining using multiple, noisy labelers
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
How evaluator domain expertise affects search result relevance judgments
Proceedings of the 17th ACM conference on Information and knowledge management
Inter-coder agreement for computational linguistics
Computational Linguistics
Towards methods for the collective gathering and quality control of relevance assessments
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Financial incentives and the "performance of crowds"
Proceedings of the ACM SIGKDD Workshop on Human Computation
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
The effect of assessor error on IR system evaluation
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Task search in a human computation market
Proceedings of the ACM SIGKDD Workshop on Human Computation
Quality management on Amazon Mechanical Turk
Proceedings of the ACM SIGKDD Workshop on Human Computation
Bucking the trend: large-scale cost-focused active learning for statistical machine translation
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Soylent: a word processor with a crowd inside
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
Crowdsourcing document relevance assessment with Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Managing crowdsourced human computation: a tutorial
Proceedings of the 20th international conference companion on World wide web
Design and implementation of relevance assessments using crowdsourcing
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
In search of quality in crowdsourcing for search engine evaluation
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
The ownership and reuse of visual media
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Crowdsourcing translation: professional quality from non-professionals
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Repeatable and reliable search system evaluation using crowdsourcing
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
The effects of choice in routing relevance judgments
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Measuring assessor accuracy: a comparison of nist assessors and user study participants
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Crowdsourcing for information retrieval: principles, methods, and applications
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Information Retrieval Evaluation
Information Retrieval Evaluation
Human Computation
Evaluation and user preference study on spatial diversity
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Active learning with Amazon Mechanical Turk
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using crowdsourcing for TREC relevance assessment
Information Processing and Management: an International Journal
Crowdsourcing for information retrieval: introduction to the special issue
Information Retrieval
Experiences surveying the crowd: reflections on methods, participation, and reliability
Proceedings of the 5th Annual ACM Web Science Conference
Hi-index | 0.00 |
Crowdsourcing has emerged as a viable platform for conducting different types of relevance evaluation. The main reason behind this trend is that it makes possible to conduct experiments extremely fast, with good results at a low cost. However, like in any experiment, there are several implementation details that would make an experiment work or fail. To gather useful results, clear instructions, user interface guidelines, content quality, inter-rater agreement metrics, work quality, and worker feedback are important characteristics of a successful crowdsourcing experiment. Furthermore, designing and implementing experiments that require thousands or millions of labels is different than conducting small scale research investigations. In this paper we outline a framework for conducting continuous crowdsourcing experiments, emphasizing aspects that should be of importance for all sorts of tasks. We illustrate the value of characteristics that can impact the overall outcome using examples based on TREC, INEX, and Wikipedia data sets.