On the effectiveness of evaluating retrieval systems in the absence of relevance judgments

  • Authors:
  • Javed A. Aslam;Robert Savell

  • Affiliations:
  • Dartmouth College, Hanover, NH;Dartmouth College, Hanover, NH

  • Venue:
  • Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Soboroff, Nicholas and Cahan recently proposed a method for evaluating the performance of retrieval systems without relevance judgments. They demonstrated that the system evaluations produced by their methodology are correlated with actual evaluations using relevance judgments in the TREC competition. In this work, we propose an explanation for this phenomenon. We devise a simple measure for quantifying the similarity of retrieval systems by assessing the similarity of their retrieved results. Then, given a collection of retrieval systems and their retrieved results, we use this measure to assess the average similarity of a system to the other systems in the collection. We demonstrate that evaluating retrieval systems according to average similarity yields results quite similar to the methodology proposed by Soboroff et~al., and we further demonstrate that these two techniques are in fact highly correlated. Thus, the techniques are effectively evaluating and ranking retrieval systems by "popularity" as opposed to "performance.