Is document frequency important for PRF?

  • Authors:
  • Stéphane Clinchant;Eric Gaussier

  • Affiliations:
  • Xerox Research Center Europe, Meylan, France and LIG Université de Grenoble, UMR;LIG Université de Grenoble, UMR

  • Venue:
  • ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce in this paper a new heuristic constraint for PRF models, referred to as the Document Frequency (DF) constraint, which is validated through a series of experiments with an oracle. We then analyze, from a theoretical point of view, state-of-the-art PRF models according to their relation with this constraint. This analysis reveals that the standard mixture model for PRF in the language modeling family does not satisfy the DF constraint on the contrary to several recently proposed models. Lastly, we perform tests, which further validate the constraint, with a simple family of tf-idf functions based on a parameter controlling the satisfaction of the DF constraint.