Stability of feature selection algorithms: a study on high-dimensional spaces

Authors:
Alexandros Kalousis;Julien Prados;Melanie Hilario
Affiliations:
University of Geneva, Computer Science Department, Geneva, Switzerland;University of Geneva, Computer Science Department, Geneva, Switzerland;University of Geneva, Computer Science Department, Geneva, Switzerland
Venue:
Knowledge and Information Systems
Year:
2007

Citing 9
Cited 35

Neural networks and the bias/variance dilemma

Neural Computation
Technical Note: Bias and the Quantification of Stability

Machine Learning - Special issue on bias evaluation and selection
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
A Unifeid Bias-Variance Decomposition and its Applications

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Unified Bias-Variance Decomposition for Zero-One and Squared Loss

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Theoretical and Empirical Analysis of ReliefF and RReliefF

Machine Learning
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
METIS: multiple extraction techniques for informative sentences

Bioinformatics

Stable feature selection via dense feature groups

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Feature Selection Using Ensemble Feature Selection Techniques

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Evaluating the Stability of Feature Selectors That Optimize Feature Subset Cardinality

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Consensus group stable feature selection

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Answering linear optimization queries with an approximate stream index

Knowledge and Information Systems
Robust Gene Selection from Microarray Data with a Novel Markov Boundary Learning Method: Application to Diabetes Analysis

ECSQARU '09 Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Incremental Bayesian Network Learning for Scalable Feature Selection

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Feature Weighting Using Margin and Radius Based Error Bound Optimization in SVMs

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Margin and Radius Based Multiple Kernel Learning

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Stable and Accurate Feature Selection

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Feature and model selection with discriminatory visualization for diagnostic classification of brain tumors

Neurocomputing
Stability analysis on rough set based feature evaluation

RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
Review Article: Stable feature selection for biomarker discovery

Computational Biology and Chemistry
Enhancing the stability and efficiency of spectral ordering with partial supervision and feature selection

Knowledge and Information Systems
Margin based sample weighting for stable feature selection

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Network-based sparse Bayesian classification

Pattern Recognition
Robust Feature Selection for Microarray Data Based on Multicriterion Fusion

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Feature selection stability assessment based on the Jensen-Shannon divergence

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
A novel stability based feature selection framework for k-means clustering

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Ensemble logistic regression for feature selection

PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
Stable Gene Selection from Microarray Data via Sample Weighting

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A new framework for dissimilarity and similarity learning

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Conditional likelihood maximisation: a unifying framework for information theoretic feature selection

The Journal of Machine Learning Research
Model mining for robust feature selection

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Measuring stability of feature ranking techniques: a noise-based approach

International Journal of Business Intelligence and Data Mining
Feature extraction in protein sequences classification: a new stability measure

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
A variance reduction framework for stable feature selection

Statistical Analysis and Data Mining
Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data

Information Sciences: an International Journal
Learning neighborhoods for metric learning

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Improving robustness of gene ranking by multi-criterion combination with novel gene importance transformation

International Journal of Data Mining and Bioinformatics
Stable Feature Selection with Minimal Independent Dominating Sets

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
A new measure for gene expression biclustering based on non-parametric correlation

Computer Methods and Programs in Biomedicine
Analysis of feature selection stability on high dimension and small sample data

Computational Statistics & Data Analysis
A survey on feature selection methods

Computers and Electrical Engineering
Feature selection for k-means clustering stability: theoretical analysis and an algorithm

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the proliferation of extremely high-dimensional data, feature selection algorithms have become indispensable components of the learning process. Strangely, despite extensive work on the stability of learning algorithms, the stability of feature selection algorithms has been relatively neglected. This study is an attempt to fill that gap by quantifying the sensitivity of feature selection algorithms to variations in the training set. We assess the stability of feature selection algorithms based on the stability of the feature preferences that they express in the form of weights-scores, ranks, or a selected feature subset. We examine a number of measures to quantify the stability of feature preferences and propose an empirical way to estimate them. We perform a series of experiments with several feature selection algorithms on a set of proteomics datasets. The experiments allow us to explore the merits of each stability measure and create stability profiles of the feature selection algorithms. Finally, we show how stability profiles can support the choice of a feature selection algorithm.