Contextual and dimensional relevance judgments for reusable SERP-level evaluation

  • Authors:
  • Peter B. Golbus;Imed Zitouni;Jin Young Kim;Ahmed Hassan;Fernando Diaz

  • Affiliations:
  • Northeastern University, Boston, MA, USA;Bing, Redmond, WA, USA;Bing, Redmond, WA, USA;Microsoft Research, Redmond, WA, USA;Microsoft Research, Redmond, WA, USA

  • Venue:
  • Proceedings of the 23rd international conference on World wide web
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document-level relevance judgments are a major component in the calculation of effectiveness metrics. Collecting high-quality judgments is therefore a critical step in information retrieval evaluation. However, the nature of and the assumptions underlying relevance judgment collection have not received much attention. In particular, relevance judgments are typically collected for each document in isolation, although users read each document in the context of other documents. In this work, we aim to investigate the nature of relevance judgment collection. We collect relevance labels in both isolated and conditional setting, and ask for judgments in various dimensions of relevance as well as overall relevance. Then we compare the relevance metrics based on various types of judgments with other metrics of quality such as user preference. Our analyses illuminate how these settings for judgment collection affect the quality and the characteristics of the judgments. We also find that the metrics based on conditional judgments show higher correlation with user preference than isolated judgments.