A General Framework of Feature Selection for Text Categorization
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Hi-index | 0.00 |
This paper proposes a term weighing scheme, Categorical Term Descriptor (CTD), for feature selection in automated text categorization. CTD is an adatation of the Term Frequency Inverse Document Frequency (TFIDF). We compared the performance of the proposed method against classical methods such as Correlation Coefficient, Chi-Square and Information Gain using the Multinomial Naïve Bayes and the Support Vector Machine (SVM) classifiers on the Reuters [10] and Reuters [115] variants of Reuters-21578 dataset. Despite its simplicity, CTD has proven to be promising for both local and global feature selection CTD works best for the Reuters [10] as a stable local FS method.