Measuring assessor accuracy: a comparison of nist assessors and user study participants

Authors:
Mark D. Smucker;Chandra Prakash Jethani
Affiliations:
University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada
Venue:
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Year:
2011

Citing 1
Cited 5

Human performance and retrieval precision revisited

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

On aggregating labels from multiple crowd workers to infer relevance of documents

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Time to judge relevance as an indicator of assessor error

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using crowdsourcing for TREC relevance assessment

Information Processing and Management: an International Journal
Implementing crowdsourcing-based relevance experimentation: an industrial perspective

Information Retrieval
Is relevance hard work?: evaluating the effort of making relevant assessments

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many situations, humans judging document relevance are forced to trade-off accuracy for speed. The development of better interactive retrieval systems and relevance assessing platforms requires the measurement of assessor accuracy, but to date the subjective nature of relevance has prevented such measurement. To quantify assessor performance, we define relevance to be a group's majority opinion, and demonstrate the value of this approach by comparing the performance of NIST assessors to a group of assessors representative of participants in many information retrieval user studies. Using data collected as part of a user study with 48 participants, we found that NIST assessors discriminate between relevant and non-relevant documents better than the average participant in our study, but that NIST assessors' true positive rate is no better than that of the study participants. In addition, we found NIST assessors to be conservative in their judgment of relevance compared to the average participant.