Information Retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
ACM SIGIR Forum
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
On ranking the effectiveness of searches
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
On GMAP: and other transformations
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Ranking robustness: a novel framework to predict query performance
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Hits hits TREC: exploring IR evaluation results with network analysis
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Performance prediction using spatial autocorrelation
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Score standardization for inter-collection comparison of retrieval systems
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions
ECIR'07 Proceedings of the 29th European conference on IR research
Measuring the variability in effectiveness of a retrieval system
IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
Hi-index | 0.00 |
Standard approaches to evaluating and comparing information retrieval systems compute simple averages of performance statistics across individual topics to measure the overall system performance. However, topics vary in their ability to differentiate among systems based on their retrieval performance. At the same time, systems that perform well on discriminative queries demonstrate notable qualities that should be reflected in the systems' evaluation and ranking. This motivated research on alternative performance measures that are sensitive to the discriminative value of topics and the performance consistency of systems. In this paper we provide a mathematical formulation of a performance measure that postulates the dependence between the system and topic characteristics. We propose the Generalized Adaptive-Weight Mean (GAWM) measure and show how it can be computed as a fixed point of a function for which the Brouwer Fixed Point Theorem applies. This guarantees the existence of a scoring scheme that satisfies the starting axioms and can be used for ranking of both systems and topics. We apply our method to TREC experiments and compare the GAWM with the standard averages used in TREC.