A variance reduction framework for stable feature selection

  • Authors:
  • Yue Han;Lei Yu

  • Affiliations:
  • Department of Computer Science, Binghamton University, State University of New York, Binghamton, NY 13902-6000, USA;Department of Computer Science, Binghamton University, State University of New York, Binghamton, NY 13902-6000, USA

  • Venue:
  • Statistical Analysis and Data Mining
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Stability of feature selection is an important but under-addressed issue in knowledge discovery from high-dimensional data. In this study, we present a theoretical framework about the relationship between the stability and the accuracy of feature selection based on a formal bias-variance decomposition of feature selection error. The framework also reveals the connection between stability and sample size and suggests a variance reduction approach for improving the stability of feature selection algorithms under small sample size. Following the theoretical framework, we propose an empirical variance reduction framework, margin-based instance weighting, which weights training instances according to their importance to feature evaluation. Our extensive experimental study first verifies the theoretical and empirical frameworks based on synthetic data sets and a popular feature selection algorithm SVM-RFE. Experiments based on real-world microarray data sets further verify that the empirical framework is effective at reducing the variance and improving the subset stability of two representative feature selection algorithms, SVM-RFE and ReliefF, while maintaining comparable predictive accuracy based on the selected features. The proposed instance weighting framework is also shown to be more effective and efficient than the ensemble framework at improving the subset stability of the feature selection algorithms under study. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012 © 2012 Wiley Periodicals, Inc.