Crowdsourcing for relevance evaluation
ACM SIGIR Forum
Quality management on Amazon Mechanical Turk
Proceedings of the ACM SIGKDD Workshop on Human Computation
Hi-index | 0.00 |
In crowdsourced relevance judging, each crowd worker typically judges only a small number of examples, yielding a sparse and imbalanced set of judgments in which relatively few workers influence output consensus labels, particularly with simple consensus methods like majority voting. We show how probabilistic matrix factorization, a standard approach in collaborative filtering, can be used to infer missing worker judgments such that all workers influence output labels. Given complete worker judgments inferred by PMF, we evaluate impact in unsupervised and supervised scenarios. In the supervised case, we consider both weighted voting and worker selection strategies based on worker accuracy. Experiments on crowd judgments from the 2010 TREC Relevance Feedback Track show promise of the PMF approach merits further investigation and analysis.