On the robustness of relevance measures with incomplete judgments

  • Authors:
  • Tanuja Bompada;Chi-Chao Chang;John Chen;Ravi Kumar;Rajesh Shenoy

  • Affiliations:
  • Yahoo!, Sunnyvale, CA;Yahoo!, Sunnyvale, CA;Yahoo!, Sunnyvale, CA;Yahoo!, Sunnyvale, CA;Yahoo!, Sunnyvale, CA

  • Venue:
  • SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate the robustness of three widely used IR relevance measures for large data collections with incomplete judgments. The relevance measures we consider are the bpref measure introduced by Buckley and Voorhees [7], the inferred average precision (infAP) introduced by Aslam and Yilmaz [4], and the normalized discounted cumulative gain (NDCG) measure introduced by Järvelin and Kekäläinen [8]. Our main results show that NDCG consistently performs better than both bpref and infAP. The experiments are performed on standard TREC datasets, under different levels of incompleteness of judgments, and using two different evaluation methods, namely, the Kendall correlation measures order between system rankings and pairwise statistical significance testing; the latter may be of independent interest.