Relevance assessment: are judges exchangeable and does it matter
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Here or there: preference judgments for relevance
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
The effect of assessor error on IR system evaluation
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
We consider the impact of inter-assessor disagreement on the maximum performance that a ranker can hope to achieve. We demonstrate that even if a ranker were to achieve perfect performance with respect to a given assessor, when evaluated with respect to a different assessor, the measured performance of the ranker decreases significantly. This decrease in performance may largely account for observed limits on the performance of learning-to-rank algorithms.