Variations in relevance judgments and the measurement of retrieval effectiveness
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The effect of topic set size on retrieval experiment error
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The Philosophy of Information Retrieval Evaluation
CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An empirical study of tokenization strategies for biomedical information retrieval
Information Retrieval
Early exit optimizations for additive machine learned ranking systems
Proceedings of the third ACM international conference on Web search and data mining
Information Retrieval Evaluation
Information Retrieval Evaluation
Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
The whens and hows of learning to rank for web search
Information Retrieval
Hi-index | 0.00 |
How can we run large-scale, community-wide evaluations of information retrieval systems if we lack the ability to distribute the document collection on which the task is based? This was the challenge we faced in the TREC Microblog tracks over the past few years. In this paper, we present a novel evaluation methodology we dub "evaluation as a service", which was implemented at TREC 2013 to address restrictions on data redistribution. The basic idea is that instead of distributing the document collection, we (the track organizers) provided a service API "in the cloud" with which participants could accomplish the evaluation task. We outline advantages as well as disadvantages of this evaluation methodology, and discuss how the approach might be extended to other evaluation scenarios.