Measuring system performance and topic discernment using generalized adaptive-weight mean

Authors:
Chung Tong Lee;Vishwa Vinay;Eduarda Mendes Rodrigues;Gabriella Kazai;Nataša Milic-Frayling;Aleksandar Ignjatovic
Affiliations:
Univeristy of New South Wales, Sydney, Australia;Microsoft Research, Cambridge, United Kingdom;Microsoft Research, Cambridge, United Kingdom;Microsoft Research, Cambridge, United Kingdom;Microsoft Research, Cambridge, United Kingdom;University of New South Wales, Sydney, Australia
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 12
Cited 1

Information Retrieval

Information Retrieval
Predicting query performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
The TREC 2005 robust track

ACM SIGIR Forum
What makes a query difficult?

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
On ranking the effectiveness of searches

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
On GMAP: and other transformations

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Ranking robustness: a novel framework to predict query performance

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Hits hits TREC: exploring IR evaluation results with network analysis

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Performance prediction using spatial autocorrelation

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Score standardization for inter-collection comparison of retrieval systems

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions

ECIR'07 Proceedings of the 29th European conference on IR research

Measuring the variability in effectiveness of a retrieval system

IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Standard approaches to evaluating and comparing information retrieval systems compute simple averages of performance statistics across individual topics to measure the overall system performance. However, topics vary in their ability to differentiate among systems based on their retrieval performance. At the same time, systems that perform well on discriminative queries demonstrate notable qualities that should be reflected in the systems' evaluation and ranking. This motivated research on alternative performance measures that are sensitive to the discriminative value of topics and the performance consistency of systems. In this paper we provide a mathematical formulation of a performance measure that postulates the dependence between the system and topic characteristics. We propose the Generalized Adaptive-Weight Mean (GAWM) measure and show how it can be computed as a fixed point of a function for which the Brouwer Fixed Point Theorem applies. This guarantees the existence of a scoring scheme that satisfies the starting axioms and can be used for ranking of both systems and topics. We apply our method to TREC experiments and compare the GAWM with the standard averages used in TREC.