A parallel derivation of probabilistic information retrieval models

  • Authors:
  • Thomas Roelleke;Jun Wang

  • Affiliations:
  • Queen Mary, University of London;Queen Mary, University of London

  • Venue:
  • SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper investigates in a stringent athematical formalism the parallel derivation of three grand probabilistic retrieval models: binary independent retrieval (BIR), Poisson model (PM), and language modelling (LM).The investigation has been motivated by a number of questions. Firstly, though sharing the same origin, namely the probability of relevance, the models differ with respect to event spaces. How can this be captured in a consistent notation, and can we relate the event spaces? Secondly, BIR and PM are closely related, but how does LM fit in? Thirdly, how are tf-idf and probabilistic models related? .The parallel investigation of the models leads to a number of formalised results: BIR and PM assume the collection to be a set of non-relevant documents, whereas LM assumes the collection to be a set of terms from relevant documents.PM can be viewed as a bridge connecting BIR and LM.A BIR-LM equivalence explains BIR as a special LM case.PM explains tf-idf, and both, BIR and LM probabilities express tf-idf in a dual way..