A GRASP algorithm for fast hybrid (filter-wrapper) feature subset selection in high-dimensional datasets

  • Authors:
  • Pablo Bermejo;Jose A. Gámez;Jose M. Puerta

  • Affiliations:
  • Intelligent Systems and Data Mining Laboratory, Computing Systems Department, Universidad de Castilla-La Mancha, Albacete 02071, Spain;Intelligent Systems and Data Mining Laboratory, Computing Systems Department, Universidad de Castilla-La Mancha, Albacete 02071, Spain;Intelligent Systems and Data Mining Laboratory, Computing Systems Department, Universidad de Castilla-La Mancha, Albacete 02071, Spain

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2011

Quantified Score

Hi-index 0.10

Visualization

Abstract

Feature subset selection is a key problem in the data-mining classification task that helps to obtain more compact and understandable models without degrading (or even improving) their performance. In this work we focus on FSS in high-dimensional datasets, that is, with a very large number of predictive attributes. In this case, standard sophisticated wrapper algorithms cannot be applied because of their complexity, and computationally lighter filter-wrapper algorithms have recently been proposed. In this work we propose a stochastic algorithm based on the GRASP meta-heuristic, with the main goal of speeding up the feature subset selection process, basically by reducing the number of wrapper evaluations to carry out. GRASP is a multi-start constructive method which constructs a solution in its first stage, and then runs an improving stage over that solution. Several instances of the proposed GRASP method are experimentally tested and compared with state-of-the-art algorithms over 12 high-dimensional datasets. The statistical analysis of the results shows that our proposal is comparable in accuracy and cardinality of the selected subset to previous algorithms, but requires significantly fewer evaluations.