Relevance information: a loss of entropy but a gain for IDF?

  • Authors:
  • Arjen P. de Vries;Thomas Roelleke

  • Affiliations:
  • CWI, The Netherlands;Queen Mary University of London, London, United Kingdom

  • Venue:
  • Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

When investigating alternative estimates for term discriminativeness, we discovered that relevance information and idf are much closer related than formulated in classical literature. Therefore, we revisited the justification of idf as it follows from the binary independent retrieval (BIR) model. The main result is a formal framework uncovering the close relationship of a generalised idf and the BIR model. The framework makes explicit how to incorporate relevance information into any retrieval function that involves an idf-component.In addition to the idf-based formulation of the BIR model, we propose Poisson-based estimates as an alternative to the classical estimates, this being motivated by the superiority of Poisson-based estimates for the within-document term frequencies. The main experimental finding is that a Poisson-based idf is superior to the classical idf, where the superiority is particularly evident for long queries.