Composition of TF normalizations: new insights on scoring functions for ad hoc IR

  • Authors:
  • François Rousseau;Michalis Vazirgiannis

  • Affiliations:
  • École Polytechnique, Palaiseau, France;Athens University of Economics and Business & École Polytechnique & Télécom ParisTech, Athens, Greece

  • Venue:
  • Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Previous papers in ad hoc IR reported that scoring functions should satisfy a set of heuristic retrieval constraints, providing a mathematical justification for the normalizations historically applied to the term frequency (TF). In this paper, we propose a further level of abstraction, claiming that the successive normalizations are carried out through composition. Thus we introduce a principled framework that fully explains BM25 as a variant of TF-IDF with an inverse order of function composition. Our experiments over standard datasets indicate that the respective orders of composition chosen in the original papers for both TF-IDF and BM25 are the most effective ones. Moreover, since the order is different between the two models, they also demonstrated that the order is instrumental in the design of weighting models. In fact, while considering more complex scoring functions such as BM25+, we discovered a novel weighting model in terms of order of composition that consistently outperforms all the rest. Our contribution here is twofold: we provide a unifying mathematical framework for IR and a novel scoring function discovered using this framework.