Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subset Cardinality

Authors:
Petr Somol;Jana Novovicova
Affiliations:
Institute of Information Theory and Automation of the Czech Academy of Sciences, Prague;Institute of Information Theory and Automation of the Czech Academy of Sciences, Prague
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2010

Citing 0
Cited 9

Feature selection with complexity measure in a quadratic programming setting

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Feature selection stability assessment based on the Jensen-Shannon divergence

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Sparse and stable gene selection with consensus SVM-RFE

Pattern Recognition Letters
Feature extraction in protein sequences classification: a new stability measure

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Toward an efficient and scalable feature selection approach for internet traffic classification

Computer Networks: The International Journal of Computer and Telecommunications Networking
Simultaneous sample and gene selection using t-score and approximate support vectors

PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics
Analysis of feature selection stability on high dimension and small sample data

Computational Statistics & Data Analysis
A survey on feature selection methods

Computers and Electrical Engineering
Binary social impact theory based optimization and its applications in pattern recognition

Neurocomputing

Quantified Score

Hi-index	0.15

Visualization

Abstract

Stability (robustness) of feature selection methods is a topic of recent interest, yet often neglected importance, with direct impact on the reliability of machine learning systems. We investigate the problem of evaluating the stability of feature selection processes yielding subsets of varying size. We introduce several novel feature selection stability measures and adjust some existing measures in a unifying framework that offers broad insight into the stability problem. We study in detail the properties of considered measures and demonstrate on various examples what information about the feature selection process can be gained. We also introduce an alternative approach to feature selection evaluation in the form of measures that enable comparing the similarity of two feature selection processes. These measures enable comparing, e.g., the output of two feature selection methods or two runs of one method with different parameters. The information obtained using the considered stability and similarity measures is shown to be usable for assessing feature selection methods (or criteria) as such.