A text categorization method based on local document frequency

  • Authors:
  • Feng Xia;Tian Jicun;Liu Zhihui

  • Affiliations:
  • School of Computer Science and Technology, Civil Aviation University of China, Tianjin, P.R.China;School of Computer Science and Technology, Civil Aviation University of China, Tianjin, P.R.China;School of Computer Science and Technology, Civil Aviation University of China, Tianjin, P.R.China

  • Venue:
  • FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, a fast and effective text categorization method named TCBLDF is proposed. TCBLDF barely needs dimensionality reduction except a stop words removal and a document frequency based feature selection. It tries to capture the relationship between a term and a category label, thus eliminates the need to know the semantic contribution of a term makes to a document it occurs in. TCBLDF use a measure to evaluate the importance of each term for the categorization task, and then gives different weights to them according to the importance evaluations. By doing so, we can make important terms affect more when making classification decision. At last we compare the method to two conventional classification methods, a Naive Bayesian learning and a linear SVM learning method. Experimental results show that TCBLDF is faster than SVM with a comparable performance and more effective than Naive Bayes, thus can be a good alternative to these methods.