A global-ranking local feature selection method for text categorization

Authors:
Roberto H. W. Pinheiro;George D. C. Cavalcanti;Renato F. Correa;Tsang Ing Ren
Affiliations:
Federal University of Pernambuco (UFPE), Center of Informatics (CIn), Av. Jornalista Anibal Fernandes s/n, Cidade Universitária, 50740-560 Recife, PE, Brazil;Federal University of Pernambuco (UFPE), Center of Informatics (CIn), Av. Jornalista Anibal Fernandes s/n, Cidade Universitária, 50740-560 Recife, PE, Brazil;Federal University of Pernambuco (UFPE), Departament of Information Science (DCI), Av. da Arquitetura s/n, CAC, Cidade Universitária, 50740-550 Recife, PE, Brazil;Federal University of Pernambuco (UFPE), Center of Informatics (CIn), Av. Jornalista Anibal Fernandes s/n, Cidade Universitária, 50740-560 Recife, PE, Brazil
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 21
Cited 2

Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Using a generalized instance set for automatic text categorization

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning to classify text from labeled and unlabeled documents

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
High-performing feature selection for text classification

Proceedings of the eleventh international conference on Information and knowledge management
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Scaling multi-class support vector machines using inter-class confusion

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An introduction to variable and feature selection

The Journal of Machine Learning Research
Distributional word clusters vs. words for text categorization

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Supervised term weighting for automated text categorization

Proceedings of the 2003 ACM symposium on Applied computing
Virtual relevant documents in text categorization with support vector machines

Information Processing and Management: an International Journal
Multilabel text categorization based on a new linear classifier learning method and a category-sensitive refinement method

Expert Systems with Applications: An International Journal
Feature selection for text classification with Naïve Bayes

Expert Systems with Applications: An International Journal
Distributional Features for Text Categorization

IEEE Transactions on Knowledge and Data Engineering
Automated multi-label text categorization with VG-RAM weightless neural networks

Neurocomputing
An effective refinement strategy for KNN text classifier

Expert Systems with Applications: An International Journal
Learning with many irrelevant features

AAAI'91 Proceedings of the ninth National conference on Artificial intelligence - Volume 2
The feature selection problem: traditional methods and a new algorithm

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence

A maximum-margin genetic algorithm for misclassification cost minimizing feature selection problem

Expert Systems with Applications: An International Journal
Comparison of text feature selection policies and using an adaptive framework

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

In this paper, we propose a filtering method for feature selection called ALOFT (At Least One FeaTure). The proposed method focuses on specific characteristics of text categorization domain. Also, it ensures that every document in the training set is represented by at least one feature and the number of selected features is determined in a data-driven way. We compare the effectiveness of the proposed method with the Variable Ranking method using three text categorization benchmarks (Reuters-21578, 20 Newsgroup and WebKB), two different classifiers (k-Nearest Neighbor and Naive Bayes) and five feature evaluation functions. The experiments show that ALOFT obtains equivalent or better results than the classical Variable Ranking.