A Variance Reduction Framework for Stable Feature Selection

Authors:
Yue Han;Lei Yu
Affiliations:
-;-
Venue:
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Year:
2010

Citing 0
Cited 6

A novel stability based feature selection framework for k-means clustering

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Stable Gene Selection from Microarray Data via Sample Weighting

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Energy-based feature selection and its ensemble version

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Simultaneous sample and gene selection using t-score and approximate support vectors

PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics
Analysis of feature selection stability on high dimension and small sample data

Computational Statistics & Data Analysis
Feature selection for k-means clustering stability: theoretical analysis and an algorithm

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Besides high accuracy, stability of feature selection has recently attracted strong interest in knowledge discovery from high-dimensional data. In this study, we present a theoretical framework about the relationship between the stability and accuracy of feature selection based on a formal bias-variance decomposition of feature selection error. The framework also suggests a variance reduction approach for improving the stability of feature selection algorithms. Furthermore, we propose an empirical variance reduction framework, margin based instance weighting, which weights training instances according to their influence to the estimation of feature relevance. We also develop an efficient algorithm under this framework. Experiments based on synthetic data and real-world micro array data verify both the theoretical framework and the effectiveness of the proposed algorithm on variance reduction. The proposed algorithm is also shown to be effective at improving subset stability, while maintaining comparable classification accuracy based on selected features.