Semi-parametric and Non-parametric Term Weighting for Information Retrieval

  • Authors:
  • Donald Metzler;Hugo Zaragoza

  • Affiliations:
  • Yahoo! Research,;Yahoo! Research,

  • Venue:
  • ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most of the previous research on term weighting for information retrieval has focused on developing specialized parametric term weighting functions. Examples include TF .IDF vector-space formulations, BM25, and language modeling weighting. Each of these term weighting functions takes on a specific parametric form. While these weighting functions have proven to be highly effective, they impose strict constraints on the functional form of the term weights. Such constraints may possibly degrade retrieval effectiveness. In this paper we propose two new classes of term weighting schemes that we call semi-parametric and non-parametric weighting. These weighting schemes make fewer assumptions about the underlying term weights and allow the data to speak for itself. We argue that these robust weighting schemes have the potential to be significantly more effective compared to existing parametric schemes, especially with the growing amount of training data becoming available.