Evaluating evaluation metrics based on the bootstrap
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance assessment: are judges exchangeable and does it matter
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
ECIR 2013: 35th european conference on information retrieval
ACM SIGIR Forum
Hi-index | 0.00 |
We focus on nDCG choice of gains, and in particular on the fracture between large differences in exponential gains of high relevance labels and the not-so-small confusion, or inconsistency, between these labels in data. We show that better gains can be derived from data by measuring the label inconsistency, to the point that virtually indistinguishable labels correspond to equal gains. Our derived optimal gains make a better nDCG objective for training Learning to Rank algorithms.