Feature selection stability assessment based on the Jensen-Shannon divergence

Authors:
Roberto Guzmán-Martínez;Rocío Alaiz-Rodríguez
Affiliations:
Servicio de Informatica y Comunicaciones, Universidad de León, León, Spain;Dpto. de Ingeniería Eléctrica y de Sistemas, Universidad de Leon, León, Spain
Venue:
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Year:
2011

Citing 14
Cited 0

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
An introduction to variable and feature selection

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Stability of Feature Selection Algorithms

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Stability of Feature Selection Algorithms

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)

Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Stability of feature selection algorithms: a study on high-dimensional spaces

Knowledge and Information Systems
A stability index for feature selection

AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
Algebraic stability indicators for ranked lists in molecular profiling

Bioinformatics
Robust Feature Selection Using Ensemble Feature Selection Techniques

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Consensus group stable feature selection

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust biomarker identification for cancer diagnosis with ensemble feature selection methods

Bioinformatics
Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions

ECIR'07 Proceedings of the 29th European conference on IR research
Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subset Cardinality

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature selection and ranking techniques play an important role in the analysis of high-dimensional data. In particular, their stability becomes crucial when the feature importance is later studied in order to better understand the underlying process. The fact that a small change in the dataset may affect the outcome of the feature selection/ranking algorithm has been long overlooked in the literature. We propose an information-theoretic approach, using the Jensen-Shannon divergence to assess this stability (or robustness). Unlike other measures, this new metric is suitable for different algorithm outcomes: full ranked lists, partial sublists (top-k lists) as well as the least studied partial ranked lists. This generalized metric attempts to measure the disagreement among a whole set of lists with the same size, following a probabilistic approach and being able to give more importance to the differences that appear at the top of the list. We illustrate and compare it with popular metrics like the Spearman rank correlation and the Kuncheva's index on feature selection/ ranking outcomes artificially generated and on an spectral fat dataset with different filter-based feature selectors.