A comparison of pooled and sampled relevance judgments

  • Authors:
  • Ian Soboroff

  • Affiliations:
  • National Institute of Standards and Technology, Gaithersburg, MD

  • Venue:
  • SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Test collections are most useful when they are reusable, that is, when they can be reliably used to rank systems that did not contribute to the pools. Pooled relevance judgments for very large collections may not be reusable for two easons: they will be very sparse and not sufficiently complete, and they may be biased in the sense that theywill unfairly rank some class of systems. The TREC 2006 terabyte track judged both a pool and a deep random sample in order to measure the effects of sparseness and bias.