Efficient feature selection filters for high-dimensional data

  • Authors:
  • Artur J. Ferreira;MáRio A. T. Figueiredo

  • Affiliations:
  • Instituto Superior de Engenharia de Lisboa, Lisboa, Portugal and Instituto de Telecomunicaçíes, Lisboa, Portugal;Instituto Superior Técnico, Lisboa, Portugal and Instituto de Telecomunicaçíes, Lisboa, Portugal

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2012

Quantified Score

Hi-index 0.10

Visualization

Abstract

Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be computationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10^5 features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster.