Efficient construction of large test collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Ranking retrieval systems without relevance judgments
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision with incomplete and imperfect judgments
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Information Processing and Management: an International Journal
Robust test collections for retrieval evaluation
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating epistemic uncertainty under incomplete assessments
Information Processing and Management: an International Journal
Generative model-based metasearch for data fusion in information retrieval
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
On rank correlation and the distance between rankings
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Relying on topic subsets for system ranking estimation
Proceedings of the 18th ACM conference on Information and knowledge management
A retrieval evaluation methodology for incomplete relevance assessments
ECIR'07 Proceedings of the 29th European conference on IR research
Retrieval system evaluation: automatic evaluation versus incomplete judgments
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Research methodology in studies of assessor effort for information retrieval evaluation
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Using clustering to improve retrieval evaluation without relevance judgments
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
An overview of Web search evaluation methods
Computers and Electrical Engineering
A case for automatic system evaluation
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Hi-index | 0.00 |
Soboroff, Nicholas and Cahan recently proposed a method for evaluating the performance of retrieval systems without relevance judgments. They demonstrated that the system evaluations produced by their methodology are correlated with actual evaluations using relevance judgments in the TREC competition. In this work, we propose an explanation for this phenomenon. We devise a simple measure for quantifying the similarity of retrieval systems by assessing the similarity of their retrieved results. Then, given a collection of retrieval systems and their retrieved results, we use this measure to assess the average similarity of a system to the other systems in the collection. We demonstrate that evaluating retrieval systems according to average similarity yields results quite similar to the methodology proposed by Soboroff et~al., and we further demonstrate that these two techniques are in fact highly correlated. Thus, the techniques are effectively evaluating and ranking retrieval systems by "popularity" as opposed to "performance.