An examination of feature selection frameworks in text categorization

Authors:
Bong Chih How;Wong Ting Kiong
Affiliations:
Faculty of Computer Science and Information Technology, Kota Samarahan, Sarawak, Malaysia;Faculty of Computer Science and Information Technology, Kota Samarahan, Sarawak, Malaysia
Venue:
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Year:
2005

Citing 7
Cited 3

Evaluating and optimizing autonomous text classification systems

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Supervised term weighting for automated text categorization

Proceedings of the 2003 ACM symposium on Applied computing
Feature selection for text categorization on imbalanced data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
An Empirical Study of Feature Selection for Text Categorization based on Term Weightage

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence

Two-level hierarchical combination method for text classification

Expert Systems with Applications: An International Journal
Aggressive dimensionality reduction with reinforcement local feature selection for text categorization

AICI'10 Proceedings of the 2010 international conference on Artificial intelligence and computational intelligence: Part I
Entropy based feature selection for text categorization

Proceedings of the 2011 ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature selection, an important task in text categorization, is used for the purpose of dimensionality reduction. Feature selection basically can be performed locally and globally. For local selection, distinct feature sets are derived from different classes. The number of feature set is thus depended on the number of class. In contrary, only one universal feature set will be used in global feature selection. It is assumed that the feature set should preserve the characteristic of all classes. Furthermore, feature selection can also be carried out based on relevant feature set only (local dictionary) or both relevant and irrelevant feature set (universal dictionary). In this paper, we explored the different frameworks of feature selection to the task of text categorization on the Reuters(10) and Reuters(115) datasets (variants of Reuters-21578 corpus). We then investigate the efficiency of 7 different local or global feature selections corresponds the use of local and universal dictionary. Our experiments have shown that local feature selection with local dictionary yields optimal categorization results.