Probabilistic retrieval based on staged logistic regression
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Inferring probability of relevance using the method of logistic regression
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Large test collection experiments on an operational, interactive system: Okapi at TREC
TREC-2 Proceedings of the second conference on Text retrieval conference
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval
Proceedings of the eighth international conference on Information and knowledge management
On Relevance, Probabilistic Indexing and Information Retrieval
Journal of the ACM (JACM)
A New Term Significance Weighting Approach
Journal of Intelligent Information Systems
Estimation and use of uncertainty in pseudo-relevance feedback
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
In probabilistic approaches to information retrieval, the occurrence of a query term in a document contributes to the probability that the document will be judged relevant. It is typically assumed that the weight assigned to a query term should be based on the expected value of that contribution. In this paper we show that the degree to which observable document features such as term frequencies are expected to vary is also important. By means of stochastic simulation, we show that increased variance results in degraded retrieval performance. We further show that by decreasing term weights in the presence of variance, this degradation can be reduced. Hence, probabilistic models of information retrieval must take into account not only the expected value of a query term's contribution but also the variance of document features.