Stable feature selection via dense feature groups

Authors:
Lei Yu;Chris Ding;Steven Loscalzo
Affiliations:
Binghamton University, Binghamton, NY, USA;University of Texas at Arlington, Arlington, TX, USA;Binghamton University, Binghamton, NY, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 22
Cited 20

Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Mean Shift, Mode Seeking, and Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature selection for high-dimensional genomic microarray data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Minimum Redundancy Feature Selection from Microarray Gene Expression Data

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Theoretical and Empirical Analysis of ReliefF and RReliefF

Machine Learning
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Consistency-based search in feature selection

Artificial Intelligence
Redundant feature elimination for multi-class problems

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Bioinformatics
Outcome signature genes in breast cancer: is there a unique set?

Bioinformatics
On Feature Selection through Clustering

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Reliable gene signatures for microarray classification: assessment of stability and performance

Bioinformatics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Stability of feature selection algorithms: a study on high-dimensional spaces

Knowledge and Information Systems
Feature selection in a kernel space

Proceedings of the 24th international conference on Machine learning
Hybrid huberized support vector machines for microarray classification

Proceedings of the 24th international conference on Machine learning
Hybrid huberized support vector machines for microarray classification

Proceedings of the 24th international conference on Machine learning

Consensus group stable feature selection

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A General Framework of Feature Selection for Text Categorization

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Stable and Accurate Feature Selection

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Ensemble gene selection by grouping for microarray data classification

Journal of Biomedical Informatics
Ensemble gene selection for cancer classification

Pattern Recognition
A new text feature conversion method for text classification

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Review Article: Stable feature selection for biomarker discovery

Computational Biology and Chemistry
Margin based sample weighting for stable feature selection

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Feature rating by random subspaces for functional brain mapping

BI'10 Proceedings of the 2010 international conference on Brain informatics
Network-based sparse Bayesian classification

Pattern Recognition
Robust Feature Selection for Microarray Data Based on Multicriterion Fusion

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A novel stability based feature selection framework for k-means clustering

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Conditional likelihood maximisation: a unifying framework for information theoretic feature selection

The Journal of Machine Learning Research
Model mining for robust feature selection

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Ranking importance based information on the world wide web

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Feature extraction in protein sequences classification: a new stability measure

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Sparse high-dimensional fractional-norm support vector machine via DC programming

Computational Statistics & Data Analysis
Stable Feature Selection with Minimal Independent Dominating Sets

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Quality of information-based source assessment and selection

Neurocomputing
Feature selection for k-means clustering stability: theoretical analysis and an algorithm

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many feature selection algorithms have been proposed in the past focusing on improving classification accuracy. In this work, we point out the importance of stable feature selection for knowledge discovery from high-dimensional data, and identify two causes of instability of feature selection algorithms: selection of a minimum subset without redundant features and small sample size. We propose a general framework for stable feature selection which emphasizes both good generalization and stability of feature selection results. The framework identifies dense feature groups based on kernel density estimation and treats features in each dense group as a coherent entity for feature selection. An efficient algorithm DRAGS (Dense Relevant Attribute Group Selector) is developed under this framework. We also introduce a general measure for assessing the stability of feature selection algorithms. Our empirical study based on microarray data verifies that dense feature groups remain stable under random sample hold out, and the DRAGS algorithm is effective in identifying a set of feature groups which exhibit both high classification accuracy and stability.