Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The effect of topic set size on retrieval experiment error
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
User performance versus precision measures for simple search tasks
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness
ACM Transactions on Information Systems (TOIS)
Comparing metrics across TREC and NTCIR: the robustness to system bias
Proceedings of the 17th ACM conference on Information and knowledge management
A few good topics: Experiments in topic set reduction for retrieval evaluation
ACM Transactions on Information Systems (TOIS)
On the choice of effectiveness measures for learning to rank
Information Retrieval
Score aggregation techniques in retrieval experimentation
ADC '09 Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92
Why finding entities in Wikipedia is difficult, sometimes
Information Retrieval
Journal of the American Society for Information Science and Technology
Evaluation in Music Information Retrieval
Journal of Intelligent Information Systems
Hi-index | 0.00 |
Information retrieval systems are compared using evaluation metrics, with researchers commonly reporting results for simple metrics such as precision-at-10 or reciprocal rank together with more complex ones such as average precision or discounted cumulative gain. In this paper, we demonstrate that complex metrics are as good as or better than simple metrics at predicting the performance of the simple metrics on other topics. Therefore, reporting of results from simple metrics alongside complex ones is redundant.