On judgments obtained from a commercial search engine

Authors:
Emine Yilmaz;Gabriella Kazai;Nick Craswell;Saied Mehrizi Tahaghoghi
Affiliations:
Microsoft Research Cambridge, Cambridge, United Kingdom;Microsoft Research Cambridge, Cambridge, United Kingdom;Microsoft, Bellevue, WA, USA;Microsoft, Bellevue, WA, USA
Venue:
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Year:
2012

Citing 4
Cited 1

Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Relevance assessment: are judges exchangeable and does it matter

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A comparative analysis of cascade measures for novelty and diversity

Proceedings of the fourth ACM international conference on Web search and data mining
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Choices in batch information retrieval evaluation

Proceedings of the 18th Australasian Document Computing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

In information retrieval, relevance judgments play an important role as they are required both for evaluating the quality of retrieval systems and for training learning to rank algorithms. In recent years, numerous papers have been published using judgments obtained from a commercial search engine by researchers in industry. As typically no information is provided about the quality of these judgments, their reliability for evaluating/training retrieval systems remains questionable. In this paper, we analyze the reliability of such judgments for evaluating the quality of retrieval systems by comparing them to judgments by NIST judges at TREC.