A term weighting approach for text categorization

  • Authors:
  • Kyung-Chan Lee;Seung-Shik Kang;Kwang-Soo Hahn

  • Affiliations:
  • School of Computer Science, Kookmin University & AITrc, Seoul, Korea;School of Computer Science, Kookmin University & AITrc, Seoul, Korea;School of Computer Science, Kookmin University & AITrc, Seoul, Korea

  • Venue:
  • AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is common that representative words in a document are identified and discriminated by their statistical distribution of their frequency statistics. We assume that evaluating the confidence measure of terms through content-based document analysis leads to a better performance than the parametric assumptions of the standard frequency-based method. In this paper, we propose a new approach of term weighting method that replaces the frequency-based probabilistic methods. Experiments on Naïve Bayesian classifiers showed that our approach achieved an improvement compared to the frequency-based method on each point of the evaluation.