Variations in relevance judgments and the measurement of retrieval effectiveness
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Relevance assessment: are judges exchangeable and does it matter
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A comparative analysis of cascade measures for novelty and diversity
Proceedings of the fourth ACM international conference on Web search and data mining
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Choices in batch information retrieval evaluation
Proceedings of the 18th Australasian Document Computing Symposium
Hi-index | 0.00 |
In information retrieval, relevance judgments play an important role as they are required both for evaluating the quality of retrieval systems and for training learning to rank algorithms. In recent years, numerous papers have been published using judgments obtained from a commercial search engine by researchers in industry. As typically no information is provided about the quality of these judgments, their reliability for evaluating/training retrieval systems remains questionable. In this paper, we analyze the reliability of such judgments for evaluating the quality of retrieval systems by comparing them to judgments by NIST judges at TREC.