Term Frequency Normalization via Pareto Distributions

  • Authors:
  • Gianni Amati;C. J. van Rijsbergen

  • Affiliations:
  • -;-

  • Venue:
  • Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We exploit the Feller-Pareto characterization of the classical Pareto distribution to derive a law relating the probability of a given term frequency in a document and its the length. A similar law was derived by Mandelbrot. We exploit the paretian distribution to obtain a term frequency normalization to substitute for the actual term frequency in the probabilistic models of Information Retrieval recently introduced in TREC-10. Preliminary results show that the unique parameter of the framework can be eliminated in favour of the the term frequency normalization derived by the Paretian law.