Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking

Authors:
Pablo Bermejo;Luis de la Ossa;José A. Gámez;José M. Puerta
Affiliations:
Department of Computing Systems, Intelligent Systems and Data Mining Laboratory (I3A), University of Castilla-La Mancha, Albacete 02071, Spain;Department of Computing Systems, Intelligent Systems and Data Mining Laboratory (I3A), University of Castilla-La Mancha, Albacete 02071, Spain;Department of Computing Systems, Intelligent Systems and Data Mining Laboratory (I3A), University of Castilla-La Mancha, Albacete 02071, Spain;Department of Computing Systems, Intelligent Systems and Data Mining Laboratory (I3A), University of Castilla-La Mancha, Albacete 02071, Spain
Venue:
Knowledge-Based Systems
Year:
2012

Citing 24
Cited 9

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Floating search methods in feature selection

Pattern Recognition Letters
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Feature subset selection by Bayesian network-based optimization

Artificial Intelligence
Feature Extraction, Construction and Selection: A Data Mining Perspective

Feature Extraction, Construction and Selection: A Data Mining Perspective
Feature Subset Selection Using a Genetic Algorithm

IEEE Intelligent Systems
Induction of Decision Trees

Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
Fast Branch & Bound Algorithms for Optimal Feature Selection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast Binary Feature Selection with Conditional Mutual Information

The Journal of Machine Learning Research
Not So Naive Bayes: Aggregating One-Dependence Estimators

Machine Learning
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Incremental wrapper-based gene selection from microarray data for cancer classification

Pattern Recognition
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Mining the ESROM: A study of breeding value classification in Manchego sheep by means of attribute selection and construction

Computers and Electronics in Agriculture
A filter model for feature subset selection based on genetic algorithm

Knowledge-Based Systems
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm

Knowledge-Based Systems
Finding key attribute subset in dataset for outlier detection

Knowledge-Based Systems
Breeding value classification in manchego sheep: a study of attribute selection and construction

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Using mutual information for selecting features in supervised neural net learning

IEEE Transactions on Neural Networks

Global feature subset selection on high-dimensional datasets using re-ranking-based EDAs

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
A two-grade approach to ranking interval data

Knowledge-Based Systems
Simple instance selection for bankruptcy prediction

Knowledge-Based Systems
Face recognition using discriminant sparsity neighborhood preserving embedding

Knowledge-Based Systems
Entropic feature discrimination ability for pattern classification based on neural IAL

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II
Wavelet feature extraction and genetic algorithm for biomarker detection in colorectal cancer data

Knowledge-Based Systems
Feature selection using dynamic weights for classification

Knowledge-Based Systems
An enhanced Customer Relationship Management classification framework with Partial Focus Feature Reduction

Expert Systems with Applications: An International Journal
A classification system based on a new wrapper feature selection algorithm for the diagnosis of primary and secondary polycythemia

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper deals with the problem of supervised wrapper-based feature subset selection in datasets with a very large number of attributes. Recently the literature has contained numerous references to the use of hybrid selection algorithms: based on a filter ranking, they perform an incremental wrapper selection over that ranking. Though working fine, these methods still have their problems: (1) depending on the complexity of the wrapper search method, the number of wrapper evaluations can still be too large; and (2) they rely on a univariate ranking that does not take into account interaction between the variables already included in the selected subset and the remaining ones. Here we propose a new approach whose main goal is to drastically reduce the number of wrapper evaluations while maintaining good performance (e.g. accuracy and size of the obtained subset). To do this we propose an algorithm that iteratively alternates between filter ranking construction and wrapper feature subset selection (FSS). Thus, the FSS only uses the first block of ranked attributes and the ranking method uses the current selected subset in order to build a new ranking where this knowledge is considered. The algorithm terminates when no new attribute is selected in the last call to the FSS algorithm. The main advantage of this approach is that only a few blocks of variables are analyzed, and so the number of wrapper evaluations decreases drastically. The proposed method is tested over eleven high-dimensional datasets (2400-46,000 variables) using different classifiers. The results show an impressive reduction in the number of wrapper evaluations without degrading the quality of the obtained subset.