IR evaluation methods for retrieving highly relevant documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation by highly relevant documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Using graded relevance assessments in IR evaluation
Journal of the American Society for Information Science and Technology
The maximum entropy method for analyzing retrieval measures
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Accurately interpreting clickthrough data as implicit feedback
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
Evaluating evaluation metrics based on the bootstrap
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision with incomplete and imperfect judgments
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A support vector method for optimizing average precision
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
AdaRank: a boosting algorithm for information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The relationship between IR effectiveness measures and user satisfaction
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
SoftRank: optimizing non-smooth rank metrics
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Relevance assessment: are judges exchangeable and does it matter
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A new interpretation of average precision
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
On the local optimality of LambdaRank
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Deep versus shallow judgments in learning to rank
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Empirical justification of the gain and discount function for nDCG
Proceedings of the 18th ACM conference on Information and knowledge management
Binary and graded relevance in IR evaluations-Comparison of the effects on ranking of IR systems
Information Processing and Management: an International Journal
LETOR: A benchmark collection for research on learning to rank for information retrieval
Information Retrieval
Ranking the NTCIR systems based on multigrade relevance
AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
On the informativeness of cascade and intent-aware effectiveness measures
Proceedings of the 20th international conference on World wide web
System effectiveness, user models, and user utility: a conceptual framework for investigation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluating diversified search results using per-intent graded relevance
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Model-based inference about IR systems
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Reranking search results for sparse queries
Proceedings of the 20th ACM international conference on Information and knowledge management
Simulating simple user behavior for system effectiveness evaluation
Proceedings of the 20th ACM international conference on Information and knowledge management
Data certification impact on health information retrieval
USAB'11 Proceedings of the 7th conference on Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society: information Quality in e-Health
Evaluation with informational and navigational intents
Proceedings of the 21st international conference on World Wide Web
Measuring the ability of score distributions to model relevance
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
On theoretically valid score distributions in information retrieval
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Time-based calibration of effectiveness measures
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
On real-time ad-hoc retrieval evaluation
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Advances on the development of evaluation measures
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
On the inference of average precision from score distributions
Proceedings of the 21st ACM international conference on Information and knowledge management
Cumulated relative position: a metric for ranking evaluation
CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Model Based Comparison of Discounted Cumulative Gain and Average Precision
Journal of Discrete Algorithms
A mutual information-based framework for the analysis of information retrieval systems
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
GAPfm: optimal top-n recommendations for graded relevance domains
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Diversified search evaluation: lessons from the NTCIR-9 INTENT task
Information Retrieval
Exploiting user disagreement for web search evaluation: an experimental approach
Proceedings of the 7th ACM international conference on Web search and data mining
Document Score Distribution Models for Query Performance Inference and Prediction
ACM Transactions on Information Systems (TOIS)
Evaluation in Music Information Retrieval
Journal of Intelligent Information Systems
Hi-index | 0.00 |
Evaluation metrics play a critical role both in the context of comparative evaluation of the performance of retrieval systems and in the context of learning-to-rank (LTR) as objective functions to be optimized. Many different evaluation metrics have been proposed in the IR literature, with average precision (AP) being the dominant one due a number of desirable properties it possesses. However, most of these measures, including average precision, do not incorporate graded relevance. In this work, we propose a new measure of retrieval effectiveness, the Graded Average Precision (GAP). GAP generalizes average precision to the case of multi-graded relevance and inherits all the desirable characteristics of AP: it has a nice probabilistic interpretation, it approximates the area under a graded precision-recall curve and it can be justified in terms of a simple but moderately plausible user model. We then evaluate GAP in terms of its informativeness and discriminative power. Finally, we show that GAP can reliably be used as an objective metric in learning to rank by illustrating that optimizing for GAP using SoftRank and LambdaRank leads to better performing ranking functions than the ones constructed by algorithms tuned to optimize for AP or NDCG even when using AP or NDCG as the test metrics.