The role of variance in term weighting for probabilistic information retrieval

Authors:
Warren R. Greiff;William T. Morgan;Jay M. Ponte
Affiliations:
The MITRE Corporation, Bedford, Massachusetts;The MITRE Corporation, Bedford, Massachusetts;The MITRE Corporation, Bedford, Massachusetts
Venue:
Proceedings of the eleventh international conference on Information and knowledge management
Year:
2002

Citing 9
Cited 2

Probabilistic retrieval based on staged logistic regression

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Inferring probability of relevance using the method of logistic regression

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Large test collection experiments on an operational, interactive system: Okapi at TREC

TREC-2 Proceedings of the second conference on Text retrieval conference
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval

Proceedings of the eighth international conference on Information and knowledge management
On Relevance, Probabilistic Indexing and Information Retrieval

Journal of the ACM (JACM)

A New Term Significance Weighting Approach

Journal of Intelligent Information Systems
Estimation and use of uncertainty in pseudo-relevance feedback

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In probabilistic approaches to information retrieval, the occurrence of a query term in a document contributes to the probability that the document will be judged relevant. It is typically assumed that the weight assigned to a query term should be based on the expected value of that contribution. In this paper we show that the degree to which observable document features such as term frequencies are expected to vary is also important. By means of stochastic simulation, we show that increased variance results in degraded retrieval performance. We further show that by decreasing term weights in the presence of variance, this degradation can be reduced. Hence, probabilistic models of information retrieval must take into account not only the expected value of a query term's contribution but also the variance of document features.