A nonparametric term weighting method for information retrieval based on measuring the divergence from independence

  • Authors:
  • İlker Kocabaş;Bekir Taner Dinçer;Bahar Karaoğlan

  • Affiliations:
  • International Computer Institute, Ege University, Bornova, Izmir, Turkey;Department of Statistics, Muğla University, Mugla, Turkey and Department of Computer Engineering, Muğla University, Mugla, Turkey;International Computer Institute, Ege University, Bornova, Izmir, Turkey

  • Venue:
  • Information Retrieval
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this article, we introduce an out-of-the-box automatic term weighting method for information retrieval. The method is based on measuring the degree of divergence from independence of terms from documents in terms of their frequency of occurrence. Divergence from independence has a well-establish underling statistical theory. It provides a plain, mathematically tractable, and nonparametric way of term weighting, and even more it requires no term frequency normalization. Besides its sound theoretical background, the results of the experiments performed on TREC test collections show that its performance is comparable to that of the state-of-the-art term weighting methods in general. It is a simple but powerful baseline alternative to the state-of-the-art methods with its theoretical and practical aspects.