Evaluation as a service for information retrieval

Authors:
Jimmy Lin;Miles Efron
Affiliations:
The iSchool, University of Maryland, College Park;Graduate School of Library and Information Science, University of Illinois, Urbana-Champaign
Venue:
ACM SIGIR Forum
Year:
2013

Citing 10
Cited 0

Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The effect of topic set size on retrieval experiment error

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The Philosophy of Information Retrieval Evaluation

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An empirical study of tokenization strategies for biomedical information retrieval

Information Retrieval
Early exit optimizations for additive machine learned ranking systems

Proceedings of the third ACM international conference on Web search and data mining
Information Retrieval Evaluation

Information Retrieval Evaluation
Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
The whens and hows of learning to rank for web search

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

How can we run large-scale, community-wide evaluations of information retrieval systems if we lack the ability to distribute the document collection on which the task is based? This was the challenge we faced in the TREC Microblog tracks over the past few years. In this paper, we present a novel evaluation methodology we dub "evaluation as a service", which was implemented at TREC 2013 to address restrictions on data redistribution. The basic idea is that instead of distributing the document collection, we (the track organizers) provided a service API "in the cloud" with which participants could accomplish the evaluation task. We outline advantages as well as disadvantages of this evaluation methodology, and discuss how the approach might be extended to other evaluation scenarios.