Exploring evaluation metrics: GMAP versus MAP

Authors:
Sri Devi Ravana;Alistair Moffat
Affiliations:
The University of Melbourne, Melbourne, Australia;The University of Melbourne, Melbourne, Australia
Venue:
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2008

Citing 3
Cited 0

On GMAP: and other transformations

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Score standardization for inter-collection comparison of retrieval systems

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
The good, the bad, the difficult, and the easy: something wrong with information retrieval evaluation?

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In retrieval experiments, an effectiveness metrics is used to generate a score for each system-topic pair being tested. It is then usual to average the system-topic scores to obtain a system score, which is used for the purpose of system comparison. In this paper we explore the ramifications of using the geometric mean (GMAP), rather than the arithmetic mean (MAP) when computing an aggregate system score from a set of system-topic scores. We find that GMAP does indeed handle variability in topic difficulty more consistently than does the usual MAP aggregation method.