Feature selection, mutual information, and the classification of high-dimensional patterns: Applications to image classification and microarray data analysis

  • Authors:
  • Boyan Bonev;Francisco Escolano;Miguel Cazorla

  • Affiliations:
  • Universidad de Alicante, Deptartamento Ciencia Computación e Inteligencia Artificial, Ap. Correos 99, 03080, Alicante, Spain;Universidad de Alicante, Deptartamento Ciencia Computación e Inteligencia Artificial, Ap. Correos 99, 03080, Alicante, Spain;Universidad de Alicante, Deptartamento Ciencia Computación e Inteligencia Artificial, Ap. Correos 99, 03080, Alicante, Spain

  • Venue:
  • Pattern Analysis & Applications - Special Issue: Non-parametric distance-based classification techniques and their applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a novel feature selection filter for supervised learning, which relies on the efficient estimation of the mutual information between a high-dimensional set of features and the classes. We bypass the estimation of the probability density function with the aid of the entropic-graphs approximation of Rényi entropy, and the subsequent approximation of the Shannon entropy. Thus, the complexity does not depend on the number of dimensions but on the number of patterns/samples, and the curse of dimensionality is circumvented. We show that it is then possible to outperform algorithms which individually rank features, as well as a greedy algorithm based on the maximal relevance and minimal redundancy criterion. We successfully test our method both in the contexts of image classification and microarray data classification. For most of the tested data sets, we obtain better classification results than those reported in the literature.