Semi-subsumed Events: A Probabilistic Semantics of the BM25 Term Frequency Quantification

  • Authors:
  • Hengzhi Wu;Thomas Roelleke

  • Affiliations:
  • Queen Mary, University of London,;Queen Mary, University of London,

  • Venue:
  • ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Through BM25, the asymptotic term frequency quantification TF = tf/(tf+K ), where ${\textmd{tf}}$ is the within-document term frequency and K is a normalisation factor, became popular. This paper reports a finding regarding the meaning of the TF quantification: in the triangle of independence and subsumption, the TF quantification forms the altitude, that is, the middle between independent and subsumed events. We refer to this new assumption as semi-subsumed. While this finding of a well-defined probabilistic assumption solves the probabilistic interpretation of the BM25 TF quantification, it is also of wider impact regarding probability theory.