A Standard Document Score for Information Retrieval

Authors:
Ronan Cummins
Affiliations:
Department of Computing and Information Systems, School of Computing and Mathematical Sciences, University of Greenwich, UK
Venue:
Proceedings of the 2013 Conference on the Theory of Information Retrieval
Year:
2013

Citing 11
Cited 0

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
An exploration of axiomatic approaches to information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query performance prediction in web search environments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Predicting Query Performance by Query-Drift Estimation

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
The automatic creation of literature abstracts

IBM Journal of Research and Development
Improved query performance prediction using standard deviation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Measuring the ability of score distributions to model relevance

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Predicting query performance directly from score distributions

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
An investigation of term weighting approaches for microblog retrieval

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a standard document retrieval score based on term-frequencies. We model the within-document term-frequency aspect of each term as a random variable. The standard score is then used to transform each random variable to a regularised form so that they can be effectively combined for use as a standard document score. The standardisation used imposes no constraints on the choice of probability distribution for the term-frequencies. We show that the standardisation automatically creates a measure of term-specificity. Analysis shows that this measure is highly correlated with the traditional idf measure, and furthermore suggests a novel interpretation and justification of idf-like measures. With experiments on a number of different TREC collections, we show that the standard document score model is comparable with BM25. However, we show that an advantage of the standard document score model is that the document scores output from the model are dimensionless quantities, and therefore are comparable across different queries and collections in certain circumstances.