Categorical Term Descriptor: A Proposed Term Weighting Scheme for Feature Selection

  • Authors:
  • Bong Chih How;Narayanan Kulathuramaiyer;Wong Ting Kiong

  • Affiliations:
  • Universiti Malaysia Sarawak;Universiti Malaysia Sarawak;Universiti Malaysia Sarawak

  • Venue:
  • WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a term weighing scheme, Categorical Term Descriptor (CTD), for feature selection in automated text categorization. CTD is an adatation of the Term Frequency Inverse Document Frequency (TFIDF). We compared the performance of the proposed method against classical methods such as Correlation Coefficient, Chi-Square and Information Gain using the Multinomial Naïve Bayes and the Support Vector Machine (SVM) classifiers on the Reuters [10] and Reuters [115] variants of Reuters-21578 dataset. Despite its simplicity, CTD has proven to be promising for both local and global feature selection CTD works best for the Reuters [10] as a stable local FS method.