Crowdsourcing document relevance assessment with Mechanical Turk

Authors:
Catherine Grady;Matthew Lease
Affiliations:
University of Texas at Austin;University of Texas at Austin
Venue:
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Year:
2010

Citing 8
Cited 21

Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Crowdsourcing for relevance evaluation

ACM SIGIR Forum
If I Had a Million Queries

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Financial incentives and the "performance of crowds"

Proceedings of the ACM SIGKDD Workshop on Human Computation
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
The anatomy of a large-scale social search engine

Proceedings of the 19th international conference on World wide web

Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Design and implementation of relevance assessments using crowdsourcing

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
In search of quality in crowdsourcing for search engine evaluation

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Overview of the INEX 2010 book track: scaling up the evaluation using crowdsourcing

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Worker types and personality traits in crowdsourcing relevance labels

Proceedings of the 20th ACM international conference on Information and knowledge management
CDAS: a crowdsourcing data analytics system

Proceedings of the VLDB Endowment
Quality through flow and immersion: gamifying crowdsourced relevance assessments

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using crowdsourcing for TREC relevance assessment

Information Processing and Management: an International Journal
An analysis of systematic judging errors in information retrieval

Proceedings of the 21st ACM international conference on Information and knowledge management
Social book search: comparing topical relevance judgements and book suggestions for evaluation

Proceedings of the 21st ACM international conference on Information and knowledge management
The face of quality in crowdsourcing relevance labels: demographics, personality and labeling accuracy

Proceedings of the 21st ACM international conference on Information and knowledge management
An introduction to crowdsourcing for language and multimedia technology research

PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems

Information Retrieval
Crowdsourcing and the crisis-affected community

Information Retrieval
Increasing cheat robustness of crowdsourcing tasks

Information Retrieval
An analysis of human factors and label accuracy in crowdsourcing relevance judgments

Information Retrieval
Implementing crowdsourcing-based relevance experimentation: an industrial perspective

Information Retrieval
User intent and assessor disagreement in web search evaluation

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Improving Text Classification Accuracy by Training Label Cleaning

ACM Transactions on Information Systems (TOIS)
Maximizing the number of worker's self-selected tasks in spatial crowdsourcing

Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate human factors involved in designing effective Human Intelligence Tasks (HITs) for Amazon's Mechanical Turk. In particular, we assess document relevance to search queries via MTurk in order to evaluate search engine accuracy. Our study varies four human factors and measures resulting experimental outcomes of cost, time, and accuracy of the assessments. While results are largely inconclusive, we identify important obstacles encountered, lessons learned, related work, and interesting ideas for future investigation. Experimental data is also made publicly available for further study by the community.