How reliable are the results of large-scale information retrieval experiments?

  • Authors:
  • Justin Zobel

  • Affiliations:
  • Department of Computer Science, RMIT, GPO Box, 2476V, Melbourne 3001, Australia

  • Venue:
  • Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Two stages in measurement of techniques for informationretrieval are gathering of documents for relevance assessment anduse of the assessments to numerically evaluate effectiveness. Weconsider both of these stages in the context of the TRECexperiments, to determine whether they lead to measurements thatare trustworthy and fair. Our detailed empirical investigation ofthe TREC results shows that the measured relative performance ofsystems appears to be reliable, but that recall is overestimated:it is likely that many relevant documents have not been found. Wepropose a new pooling strategy that can significantly in- creasethe number of relevant documents found for given effort, withoutcompromising fairness.