Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
IR evaluation methods for retrieving highly relevant documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Blind Men and Elephants: Six Approaches to TREC data
Information Retrieval
Evaluation by highly relevant documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Binary and graded relevance in IR evaluations: comparison of the effects on ranking of IR systems
Information Processing and Management: an International Journal
The maximum entropy method for analyzing retrieval measures
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
Evaluating evaluation metrics based on the bootstrap
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Test theory for assessing IR test collections
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The relationship between IR effectiveness measures and user satisfaction
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation over thousands of queries
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Extending average precision to graded relevance judgments
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Simulating simple user behavior for system effectiveness evaluation
Proceedings of the 20th ACM international conference on Information and knowledge management
Model Based Comparison of Discounted Cumulative Gain and Average Precision
Journal of Discrete Algorithms
On the measurement of test collection reliability
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Exploiting user disagreement for web search evaluation: an experimental approach
Proceedings of the 7th ACM international conference on Web search and data mining
Evaluation in Music Information Retrieval
Journal of Intelligent Information Systems
Hi-index | 0.00 |
The nDCG measure has proven to be a popular measure of retrieval effectiveness utilizing graded relevance judgments. However, a number of different instantiations of nDCG exist, depending on the arbitrary definition of the gain and discount functions used (1) to dictate the relative value of documents of different relevance grades and (2) to weight the importance of gain values at different ranks, respectively. In this work we discuss how to empirically derive a gain and discount function that optimizes the efficiency or stability of nDCG. First, we describe a variance decomposition analysis framework and an optimization procedure utilized to find the efficiency- or stability-optimal gain and discount functions. Then we use TREC data sets to compare the optimal gain and discount functions to the ones that have appeared in the IR literature with respect to (a) the efficiency of the evaluation, (b) the induced ranking of systems, and (c) the discriminative power of the resulting nDCG measure.