How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Hits hits TREC: exploring IR evaluation results with network analysis
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Boiling down information retrieval test collections
RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Hi-index | 0.00 |
The need for evaluating large amounts of topics (queries) makes IR evaluation an uneasy task. In this paper, we study a topic selection problem for IR evaluation. The selection criterion is based on the overall difficulty of the chosen set, as well as the uncertainty of the final IR metric applied to the systems. Our preliminary experiments demonstrate that our approach helps to identify a set of topics that provides confident estimates of systems' performance while keeping the requirement of the query difficulty.