Consensus group stable feature selection

Authors:
Steven Loscalzo;Lei Yu;Chris Ding
Affiliations:
Binghamton University, Binghamton, NY, USA;Binghamton University, Binghamton, NY, USA;University of Texas at Arlington, Arlington, TX, USA
Venue:
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2009

Citing 20
Cited 15

An introduction to computational learning theory

An introduction to computational learning theory
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Machine Learning

Machine Learning
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Mean Shift, Mode Seeking, and Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Minimum Redundancy Feature Selection from Microarray Gene Expression Data

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Redundant feature elimination for multi-class problems

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Ensemble feature ranking

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Bioinformatics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Stability of feature selection algorithms: a study on high-dimensional spaces

Knowledge and Information Systems
Minimum reference set based feature selection for small sample classifications

Proceedings of the 24th international conference on Machine learning
Supervised feature selection via dependence estimation

Proceedings of the 24th international conference on Machine learning
Stable feature selection via dense feature groups

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Review Article: Stable feature selection for biomarker discovery

Computational Biology and Chemistry
Margin based sample weighting for stable feature selection

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Network-based sparse Bayesian classification

Pattern Recognition
Robust Feature Selection for Microarray Data Based on Multicriterion Fusion

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Feature selection stability assessment based on the Jensen-Shannon divergence

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
A novel stability based feature selection framework for k-means clustering

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Stable Gene Selection from Microarray Data via Sample Weighting

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Energy-based feature selection and its ensemble version

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Software measurement data reduction using ensemble techniques

Neurocomputing
Model mining for robust feature selection

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Measuring stability of feature ranking techniques: a noise-based approach

International Journal of Business Intelligence and Data Mining
A variance reduction framework for stable feature selection

Statistical Analysis and Data Mining
Sparse high-dimensional fractional-norm support vector machine via DC programming

Computational Statistics & Data Analysis
Stable Feature Selection with Minimal Independent Dominating Sets

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Feature selection for k-means clustering stability: theoretical analysis and an algorithm

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stability is an important yet under-addressed issue in feature selection from high-dimensional and small sample data. In this paper, we show that stability of feature selection has a strong dependency on sample size. We propose a novel framework for stable feature selection which first identifies consensus feature groups from subsampling of training samples, and then performs feature selection by treating each consensus feature group as a single entity. Experiments on both synthetic and real-world data sets show that an algorithm developed under this framework is effective at alleviating the problem of small sample size and leads to more stable feature selection results and comparable or better generalization performance than state-of-the-art feature selection algorithms. Synthetic data sets and algorithm source code are available at http://www.cs.binghamton.edu/~lyu/KDD09/.