The effect of threshold priming and need for cognition on relevance calibration and assessment

Authors:
Falk Scholer;Diane Kelly;Wan-Ching Wu;Hanseul S. Lee;William Webber
Affiliations:
RMIT University, Melbourne, Australia;University of North Carolina, Chapel Hill, NC, USA;University of North Carolina, Chapel Hill, NC, USA;University of North Carolina, Chapel Hill, NC, USA;University of Maryland, College Park, MD, USA
Venue:
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Year:
2013

Citing 16
Cited 1

User-defined relevance criteria: an exploratory study

Journal of the American Society for Information Science - Special issue: relevance research
Variations in relevance judgments and the measurement of retrieval effectiveness

Information Processing and Management: an International Journal
Liberal relevance criteria of TREC -: counting on negligible documents?

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Changes in Search Tactics and Relevance Judgements when Preparing a Research Proposal A Summary of the Findings of a Longitudinal Study

Information Retrieval
The concept of relevance in IR

Journal of the American Society for Information Science and Technology
The influence of document presentation order and number of documents judged on users' judgments of relevance

Journal of the American Society for Information Science and Technology
Web search strategies and human individual differences: Cognitive and demographic factors, Internet attitudes, and approaches: Research Articles

Journal of the American Society for Information Science and Technology
How users assess web pages for information seeking

Journal of the American Society for Information Science and Technology
Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance

Journal of the American Society for Information Science and Technology
Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance

Journal of the American Society for Information Science and Technology
The effect of assessor error on IR system evaluation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A user study of relevance judgments for e-discovery

Proceedings of the 73rd ASIS&T Annual Meeting on Navigating Streams in an Information Ecosystem - Volume 47
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Quantifying test collection quality based on the consistency of relevance judgements

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Effect of written instructions on assessor agreement

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
An analysis of systematic judging errors in information retrieval

Proceedings of the 21st ACM international conference on Information and knowledge management

Contextual and dimensional relevance judgments for reusable SERP-level evaluation

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Human assessments of document relevance are needed for the construction of test collections, for ad-hoc evaluation, and for training text classifiers. Showing documents to assessors in different orderings, however, may lead to different assessment outcomes. We examine the effect that \defineterm{threshold priming}, seeing varying degrees of relevant documents, has on people's calibration of relevance. Participants judged the relevance of a prologue of documents containing highly relevant, moderately relevant, or non-relevant ocuments, followed by a common epilogue of documents of mixed relevance. We observe that participants exposed to only non-relevant documents in the prologue assigned significantly higher average relevance scores to prologue and epilogue documents than participants exposed to moderately or highly relevant documents in the prologue. We also examine how \defineterm{need for cognition}, an individual difference measure of the extent to which a person enjoys engaging in effortful cognitive activity, impacts relevance assessments. High need for cognition participants had a significantly higher level of agreement with expert assessors than low need for cognition participants did. Our findings indicate that assessors should be exposed to documents from multiple relevance levels early in the judging process, in order to calibrate their relevance thresholds in a balanced way, and that individual difference measures might be a useful way to screen assessors.