On the effectiveness of evaluating retrieval systems in the absence of relevance judgments

Authors:
Javed A. Aslam;Robert Savell
Affiliations:
Dartmouth College, Hanover, NH;Dartmouth College, Hanover, NH
Venue:
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Year:
2003

Citing 2
Cited 15

Efficient construction of large test collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Ranking retrieval systems without relevance judgments

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

Estimating average precision with incomplete and imperfect judgments

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Using the structure of overlap between search results to rank retrieval systems without relevance judgments

Information Processing and Management: an International Journal
Robust test collections for retrieval evaluation

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Alternatives to Bpref

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating epistemic uncertainty under incomplete assessments

Information Processing and Management: an International Journal
On information retrieval metrics designed for evaluation with incomplete relevance assessments

Information Retrieval
Generative model-based metasearch for data fusion in information retrieval

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
On rank correlation and the distance between rankings

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Relying on topic subsets for system ranking estimation

Proceedings of the 18th ACM conference on Information and knowledge management
A retrieval evaluation methodology for incomplete relevance assessments

ECIR'07 Proceedings of the 29th European conference on IR research
Retrieval system evaluation: automatic evaluation versus incomplete judgments

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Research methodology in studies of assessor effort for information retrieval evaluation

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Using clustering to improve retrieval evaluation without relevance judgments

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
An overview of Web search evaluation methods

Computers and Electrical Engineering
A case for automatic system evaluation

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Soboroff, Nicholas and Cahan recently proposed a method for evaluating the performance of retrieval systems without relevance judgments. They demonstrated that the system evaluations produced by their methodology are correlated with actual evaluations using relevance judgments in the TREC competition. In this work, we propose an explanation for this phenomenon. We devise a simple measure for quantifying the similarity of retrieval systems by assessing the similarity of their retrieved results. Then, given a collection of retrieval systems and their retrieved results, we use this measure to assess the average similarity of a system to the other systems in the collection. We demonstrate that evaluating retrieval systems according to average similarity yields results quite similar to the methodology proposed by Soboroff et~al., and we further demonstrate that these two techniques are in fact highly correlated. Thus, the techniques are effectively evaluating and ranking retrieval systems by "popularity" as opposed to "performance.