Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Blind Men and Elephants: Six Approaches to TREC data
Information Retrieval
Statistical precision of information retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
On GMAP: and other transformations
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Estimating average precision with incomplete and imperfect judgments
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A new interpretation of average precision
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
On the choice of effectiveness measures for learning to rank
Information Retrieval
Model-based inference about IR systems
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
On per-topic variance in IR evaluation
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
On the inference of average precision from score distributions
Proceedings of the 21st ACM international conference on Information and knowledge management
Modelling Score Distributions Without Actual Scores
Proceedings of the 2013 Conference on the Theory of Information Retrieval
On Using Fewer Topics in Information Retrieval Evaluations
Proceedings of the 2013 Conference on the Theory of Information Retrieval
Hi-index | 0.00 |
On the basis of a theoretical analysis of issues around populations and sampling, for both topics and documents, and parameters with which we hope to characterise the effectiveness of different systems, we propose a modification to the traditional average precision metric. This modification involves both transformation and (in the estimation of the parameter) smoothing. The modified version is shown to have certain distributional advantages, on a substantial dataset. In particular, the distribution of values of the modified metric, over topics for a given system/run, is approximately normal.