Analytical evaluation of term weighting schemes for text categorization

  • Authors:
  • Hakan Altınçay;Zafer Erenel

  • Affiliations:
  • Department of Computer Engineering, Eastern Mediterranean University, Famagusta, Northern Cyprus, Turkey;Department of Computer Engineering, Eastern Mediterranean University, Famagusta, Northern Cyprus, Turkey

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2010

Quantified Score

Hi-index 0.10

Visualization

Abstract

An analytical evaluation of six widely used term weighting techniques for text categorization is presented. The analysis depends on expressing the term weights using term occurrence probabilities in positive and negative categories. The weighting behaviors of the schemes considered are firstly clarified by analyzing the relation between the occurrence probabilities of terms which receive equal weights. Then, the weights are expressed in terms of ratio and difference of term occurrence probabilities where the similarities and differences among different schemes are revealed. Simulations show that the relative performance of different schemes can be explained by the ways they use ratio and difference of term occurrence probabilities in generating the term weights.