Bias and the limits of pooling
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Quality management on Amazon Mechanical Turk
Proceedings of the ACM SIGKDD Workshop on Human Computation
On aggregating labels from multiple crowd workers to infer relevance of documents
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
The TREC 2012 Crowdsourcing track asked participants to crowdsource relevance assessments with the goal of replicating costly expert judgements with relatively fast, inexpensive, but less reliable judgements from anonymous online workers. The track used 10 "ad-hoc" queries, highly specific and complex (as compared to web search). The crowdsourced assessments were evaluated against expert judgments made by highly trained and capable human analysts in 1999 as part of ad hoc track collection construction. Since most crowdsourcing approaches submitted to the TREC 2012 track produced assessment sets nowhere close to the expert judgements, we decided to analyze crowdsourcing mistakes made on this task using data we collected via Amazon's Mechanical Turk service. We investigate two types of crowdsourcing approaches: one that asks for nominal relevance grades for each document, and the other that asks for preferences on many (not all) pairs of documents.