A variance reduction framework for stable feature selection

Authors:
Yue Han;Lei Yu
Affiliations:
Department of Computer Science, Binghamton University, State University of New York, Binghamton, NY 13902-6000, USA;Department of Computer Science, Binghamton University, State University of New York, Binghamton, NY 13902-6000, USA
Venue:
Statistical Analysis and Data Mining
Year:
2012

Citing 22
Cited 0

Support-Vector Networks

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Simulation and the Monte Carlo Method

Simulation and the Monte Carlo Method
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
A Unifeid Bias-Variance Decomposition and its Applications

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Theoretical and Empirical Analysis of ReliefF and RReliefF

Machine Learning
Stability and generalization

The Journal of Machine Learning Research
An introduction to variable and feature selection

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Grafting: fast, incremental feature selection by gradient descent in function space

The Journal of Machine Learning Research
Learning to Decode Cognitive States from Brain Images

Machine Learning
Margin based feature selection - theory and algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
The Entire Regularization Path for the Support Vector Machine

The Journal of Machine Learning Research
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Reliable gene signatures for microarray classification: assessment of stability and performance

Bioinformatics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Stability of feature selection algorithms: a study on high-dimensional spaces

Knowledge and Information Systems
A stability index for feature selection

AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
Robust Feature Selection Using Ensemble Feature Selection Techniques

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Consensus group stable feature selection

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
On Feature Selection, Bias-Variance, and Bagging

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stability of feature selection is an important but under-addressed issue in knowledge discovery from high-dimensional data. In this study, we present a theoretical framework about the relationship between the stability and the accuracy of feature selection based on a formal bias-variance decomposition of feature selection error. The framework also reveals the connection between stability and sample size and suggests a variance reduction approach for improving the stability of feature selection algorithms under small sample size. Following the theoretical framework, we propose an empirical variance reduction framework, margin-based instance weighting, which weights training instances according to their importance to feature evaluation. Our extensive experimental study first verifies the theoretical and empirical frameworks based on synthetic data sets and a popular feature selection algorithm SVM-RFE. Experiments based on real-world microarray data sets further verify that the empirical framework is effective at reducing the variance and improving the subset stability of two representative feature selection algorithms, SVM-RFE and ReliefF, while maintaining comparable predictive accuracy based on the selected features. The proposed instance weighting framework is also shown to be more effective and efficient than the ensemble framework at improving the subset stability of the feature selection algorithms under study. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012 © 2012 Wiley Periodicals, Inc.