Stable feature selection via dense feature groups

  • Authors:
  • Lei Yu;Chris Ding;Steven Loscalzo

  • Affiliations:
  • Binghamton University, Binghamton, NY, USA;University of Texas at Arlington, Arlington, TX, USA;Binghamton University, Binghamton, NY, USA

  • Venue:
  • Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many feature selection algorithms have been proposed in the past focusing on improving classification accuracy. In this work, we point out the importance of stable feature selection for knowledge discovery from high-dimensional data, and identify two causes of instability of feature selection algorithms: selection of a minimum subset without redundant features and small sample size. We propose a general framework for stable feature selection which emphasizes both good generalization and stability of feature selection results. The framework identifies dense feature groups based on kernel density estimation and treats features in each dense group as a coherent entity for feature selection. An efficient algorithm DRAGS (Dense Relevant Attribute Group Selector) is developed under this framework. We also introduce a general measure for assessing the stability of feature selection algorithms. Our empirical study based on microarray data verifies that dense feature groups remain stable under random sample hold out, and the DRAGS algorithm is effective in identifying a set of feature groups which exhibit both high classification accuracy and stability.