A global-ranking local feature selection method for text categorization

  • Authors:
  • Roberto H. W. Pinheiro;George D. C. Cavalcanti;Renato F. Correa;Tsang Ing Ren

  • Affiliations:
  • Federal University of Pernambuco (UFPE), Center of Informatics (CIn), Av. Jornalista Anibal Fernandes s/n, Cidade Universitária, 50740-560 Recife, PE, Brazil;Federal University of Pernambuco (UFPE), Center of Informatics (CIn), Av. Jornalista Anibal Fernandes s/n, Cidade Universitária, 50740-560 Recife, PE, Brazil;Federal University of Pernambuco (UFPE), Departament of Information Science (DCI), Av. da Arquitetura s/n, CAC, Cidade Universitária, 50740-550 Recife, PE, Brazil;Federal University of Pernambuco (UFPE), Center of Informatics (CIn), Av. Jornalista Anibal Fernandes s/n, Cidade Universitária, 50740-560 Recife, PE, Brazil

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.05

Visualization

Abstract

In this paper, we propose a filtering method for feature selection called ALOFT (At Least One FeaTure). The proposed method focuses on specific characteristics of text categorization domain. Also, it ensures that every document in the training set is represented by at least one feature and the number of selected features is determined in a data-driven way. We compare the effectiveness of the proposed method with the Variable Ranking method using three text categorization benchmarks (Reuters-21578, 20 Newsgroup and WebKB), two different classifiers (k-Nearest Neighbor and Naive Bayes) and five feature evaluation functions. The experiments show that ALOFT obtains equivalent or better results than the classical Variable Ranking.