Test theory for assessing IR test collections

  • Authors:
  • David Bodoff;Pu Li

  • Affiliations:
  • University of Haifa, Haifa, Israel;State University of New York, at Buffalo, NY

  • Venue:
  • SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

How good is an IR test collection? A series of papers in recent years has addressed the question by empirically enumerating the consistency of performance comparisons using alternate subsets of the collection. In this paper we propose using Test Theory, which is based on analysis of variance and is specifically designed to assess test collections. Using the method, we not only can measure test reliability after the fact, but we can estimate the test collection's reliability before it is even built or used. We can also determine an optimal allocation of resources before the fact, e.g. whether to invest in more judges or queries. The method, which is in widespread use in the field of educational testing, complements data-driven approaches to assessing test collections. Whereas the data-driven method focuses on test results, test theory focuses on test designs. It offers unique practical results, as well as insights about the variety and implications of alternative test designs.