Reusable test collections through experimental design

  • Authors:
  • Ben Carterette;Evangelos Kanoulas;Virgil Pavlu;Hui Fang

  • Affiliations:
  • University of Delaware, Newark, DE, USA;University of Sheffield, Sheffield, United Kingdom;Northeastern University, Boston, MA, USA;University of Delaware, Newark, DE, USA

  • Venue:
  • Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Portable, reusable test collections are a vital part of research and development in information retrieval. Reusability is difficult to assess, however. The standard approach--simulating judgment collection when groups of systems are held out, then evaluating those held-out systems--only works when there is a large set of relevance judgments to draw on during the simulation. As test collections adapt to larger and larger corpora, it becomes less and less likely that there will be sufficient judgments for such simulation experiments. Thus we propose a methodology for information retrieval experimentation that collects evidence for or against the reusability of a test collection while judgments are being made. Using this methodology along with the appropriate statistical analyses, researchers will be able to estimate the reusability of their test collections while building them and implement "course corrections" if the collection does not seem to be achieving desired levels of reusability. We show the robustness of our design to inherent sources of variance, and provide a description of an actual implementation of the framework for creating a large test collection.