An Empirical Study of Feature Selection for Text Categorization based on Term Weightage

Authors:
Bong Chih How;K. Narayanan
Affiliations:
Universiti Malaysia Sarawak;Universiti Malaysia Sarawak
Venue:
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Year:
2004

Citing 4
Cited 8

Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Term Weighting Approaches in Automatic Text Retrieval

Term Weighting Approaches in Automatic Text Retrieval
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research

Some Effective Techniques for Naive Bayes Text Classification

IEEE Transactions on Knowledge and Data Engineering
Using ambiguity measure feature selection algorithm for support vector machine classifier

Proceedings of the 2008 ACM symposium on Applied computing
A class-feature-centroid classifier for text categorization

Proceedings of the 18th international conference on World wide web
An examination of feature selection frameworks in text categorization

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Macro features based text categorization

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
A high performance centroid-based classification approach for language identification

Pattern Recognition Letters
Comparison of text feature selection policies and using an adaptive framework

Expert Systems with Applications: An International Journal
A model for mining material properties for radiation shielding

Integrated Computer-Aided Engineering

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper proposes a local feature selection (FS) measure namely, Categorical Descriptor Term (CTD) for text categorization. It is derived based on classic term weighting scheme, TFIDF. The method explicitly chooses feature set for each category by only selecting set of terms from relevant category. Although past literatures have suggested that the use of features from irrelevant categories can improve the measure of text categorization, we believe that by incorporating only relevant feature can be highly effective. The experimental comparison is carried out between CTD and five well-known feature selection measures: Information Gain, Chi-Square, Correlation Coefficient, Odd Ratio and GSS Coefficient. The results also show that our proposed method can perform comparatively well with other FS measures, especially on collection with highly overlapped topics.