Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
An exploration of axiomatic approaches to information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query performance prediction in web search environments
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Predicting Query Performance by Query-Drift Estimation
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
The automatic creation of literature abstracts
IBM Journal of Research and Development
Improved query performance prediction using standard deviation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Measuring the ability of score distributions to model relevance
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Predicting query performance directly from score distributions
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
An investigation of term weighting approaches for microblog retrieval
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
In this paper we propose a standard document retrieval score based on term-frequencies. We model the within-document term-frequency aspect of each term as a random variable. The standard score is then used to transform each random variable to a regularised form so that they can be effectively combined for use as a standard document score. The standardisation used imposes no constraints on the choice of probability distribution for the term-frequencies. We show that the standardisation automatically creates a measure of term-specificity. Analysis shows that this measure is highly correlated with the traditional idf measure, and furthermore suggests a novel interpretation and justification of idf-like measures. With experiments on a number of different TREC collections, we show that the standard document score model is comparable with BM25. However, we show that an advantage of the standard document score model is that the document scores output from the model are dimensionless quantities, and therefore are comparable across different queries and collections in certain circumstances.