The role of variance in term weighting for probabilistic information retrieval

  • Authors:
  • Warren R. Greiff;William T. Morgan;Jay M. Ponte

  • Affiliations:
  • The MITRE Corporation, Bedford, Massachusetts;The MITRE Corporation, Bedford, Massachusetts;The MITRE Corporation, Bedford, Massachusetts

  • Venue:
  • Proceedings of the eleventh international conference on Information and knowledge management
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In probabilistic approaches to information retrieval, the occurrence of a query term in a document contributes to the probability that the document will be judged relevant. It is typically assumed that the weight assigned to a query term should be based on the expected value of that contribution. In this paper we show that the degree to which observable document features such as term frequencies are expected to vary is also important. By means of stochastic simulation, we show that increased variance results in degraded retrieval performance. We further show that by decreasing term weights in the presence of variance, this degradation can be reduced. Hence, probabilistic models of information retrieval must take into account not only the expected value of a query term's contribution but also the variance of document features.