Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Crowdsourcing for relevance evaluation
ACM SIGIR Forum
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Financial incentives and the "performance of crowds"
Proceedings of the ACM SIGKDD Workshop on Human Computation
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
The anatomy of a large-scale social search engine
Proceedings of the 19th international conference on World wide web
Creating speech and language data with Amazon's Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Design and implementation of relevance assessments using crowdsourcing
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
In search of quality in crowdsourcing for search engine evaluation
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Overview of the INEX 2010 book track: scaling up the evaluation using crowdsourcing
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Worker types and personality traits in crowdsourcing relevance labels
Proceedings of the 20th ACM international conference on Information and knowledge management
CDAS: a crowdsourcing data analytics system
Proceedings of the VLDB Endowment
Quality through flow and immersion: gamifying crowdsourced relevance assessments
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using crowdsourcing for TREC relevance assessment
Information Processing and Management: an International Journal
An analysis of systematic judging errors in information retrieval
Proceedings of the 21st ACM international conference on Information and knowledge management
Social book search: comparing topical relevance judgements and book suggestions for evaluation
Proceedings of the 21st ACM international conference on Information and knowledge management
Proceedings of the 21st ACM international conference on Information and knowledge management
An introduction to crowdsourcing for language and multimedia technology research
PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Crowdsourcing and the crisis-affected community
Information Retrieval
Increasing cheat robustness of crowdsourcing tasks
Information Retrieval
An analysis of human factors and label accuracy in crowdsourcing relevance judgments
Information Retrieval
Implementing crowdsourcing-based relevance experimentation: an industrial perspective
Information Retrieval
User intent and assessor disagreement in web search evaluation
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Improving Text Classification Accuracy by Training Label Cleaning
ACM Transactions on Information Systems (TOIS)
Maximizing the number of worker's self-selected tasks in spatial crowdsourcing
Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Hi-index | 0.00 |
We investigate human factors involved in designing effective Human Intelligence Tasks (HITs) for Amazon's Mechanical Turk. In particular, we assess document relevance to search queries via MTurk in order to evaluate search engine accuracy. Our study varies four human factors and measures resulting experimental outcomes of cost, time, and accuracy of the assessments. While results are largely inconclusive, we identify important obstacles encountered, lessons learned, related work, and interesting ideas for future investigation. Experimental data is also made publicly available for further study by the community.