Improving stability of feature selection methods

Authors:
Pavel Křížek;Josef Kittler;Václav Hlaváč
Affiliations:
Czech Technical University in Prague, Center for Machine Perception, Czech Republic;University of Surrey, Centre for Vision, Speech, and Signal Processing, Guildford, United Kingdom;Czech Technical University in Prague, Center for Machine Perception, Czech Republic
Venue:
CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
Year:
2007

Citing 7
Cited 10

Floating search methods in feature selection

Pattern Recognition Letters
Feature Selection: Evaluation, Application, and Small Sample Performance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Stability of Feature Selection Algorithms

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Stability of Feature Selection Algorithms

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A stability index for feature selection

AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Robust Feature Selection Using Ensemble Feature Selection Techniques

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Evaluating the Stability of Feature Selectors That Optimize Feature Subset Cardinality

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Margin based sample weighting for stable feature selection

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Robust Feature Selection for Microarray Data Based on Multicriterion Fusion

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Measuring stability of feature ranking techniques: a noise-based approach

International Journal of Business Intelligence and Data Mining
Feature extraction in protein sequences classification: a new stability measure

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data

Information Sciences: an International Journal
Improving robustness of gene ranking by multi-criterion combination with novel gene importance transformation

International Journal of Data Mining and Bioinformatics
A new measure for gene expression biclustering based on non-parametric correlation

Computer Methods and Programs in Biomedicine
Analysis of feature selection stability on high dimension and small sample data

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

An improper design of feature selection methods can often lead to incorrect conclusions. Moreover, it is not generally realised that functional values of the criterion guiding the search for the best feature set are random variables with some probability distribution. This contribution examines the influence of several estimation techniques on the consistency of the final result. We propose an entropy based measure which can assess the stability of feature selection methods with respect to perturbations in the data. Results show that filters achieve a better stability and performance if more samples are employed for the estimation, i.e., using leave-one-out cross-validation, for instance. However, the best results for wrappers are acquired with the 50/50 holdout validation.