Feature selection for classifying high-dimensional numerical data

Authors:
Yimin Wu;Aidong Zhang
Affiliations:
Department of Computer Science and Engineering, SUNY at Buffalo;Department of Computer Science and Engineering, SUNY at Buffalo
Venue:
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Year:
2004

Citing 6
Cited 11

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Bagging predictors

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
General and Efficient Multisplitting of Numerical Attributes

Machine Learning
Machine Learning

Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning

Selecting features in microarray classification using ROC curves

Pattern Recognition
Traffic flooding attack detection with SNMP MIB using SVM

Computer Communications
Feature Selection Based on the Rough Set Theory and Expectation-Maximization Clustering Algorithm

RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing
Object detection using spatial histogram features

Image and Vision Computing
DRFE: dynamic recursive feature elimination for gene identification based on random forest

ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Feature elimination approach based on random forest for cancer diagnosis

MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Conditional infomax learning: an integrated framework for feature extraction and fusion

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Virtual gene: a gene selection algorithm for sample classification on microarray datasets

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Virtual gene: using correlations between genes to select informative genes on microarray datasets

Transactions on Computational Systems Biology II
Boost feature subset selection: a new gene selection algorithm for microarray dataset

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classifying high-dimensional numerical data is a very challenging problem. In high dimensional feature spaces, the performance of supervised learning methods suffer from the curse of dimensionality, which degrades both classification accuracy and efficiency. To address this issue, we present an efficient feature selection method to facilitate classifying high-dimensional numerical data. Our method employs balanced information gain to measure the contribution of each feature (for data classification); and it calculates feature correlation with a novel extension of balanced information gain. By integrating feature contribution and correlation, our feature selection approach uses a forward sequential selection algorithm to select uncorrelated features with large balanced information gain. Extensive experiments have been carried out on image and gene microarray datasets to demonstrate the effectiveness and robustness of the presented method.