A Standard Document Score for Information Retrieval

  • Authors:
  • Ronan Cummins

  • Affiliations:
  • Department of Computing and Information Systems, School of Computing and Mathematical Sciences, University of Greenwich, UK

  • Venue:
  • Proceedings of the 2013 Conference on the Theory of Information Retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose a standard document retrieval score based on term-frequencies. We model the within-document term-frequency aspect of each term as a random variable. The standard score is then used to transform each random variable to a regularised form so that they can be effectively combined for use as a standard document score. The standardisation used imposes no constraints on the choice of probability distribution for the term-frequencies. We show that the standardisation automatically creates a measure of term-specificity. Analysis shows that this measure is highly correlated with the traditional idf measure, and furthermore suggests a novel interpretation and justification of idf-like measures. With experiments on a number of different TREC collections, we show that the standard document score model is comparable with BM25. However, we show that an advantage of the standard document score model is that the document scores output from the model are dimensionless quantities, and therefore are comparable across different queries and collections in certain circumstances.