Optimization-based feature selection with adaptive instance sampling

Authors:
Jaekyung Yang;Sigurdur Olafsson
Affiliations:
Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA;Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA
Venue:
Computers and Operations Research
Year:
2006

Citing 16
Cited 14

A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
The power of sampling in knowledge discovery

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Learning Boolean concepts in the presence of many irrelevant features

Artificial Intelligence
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Efficient progressive sampling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature selection in unsupervised learning via evolutionary search

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature Extraction, Construction and Selection: A Data Mining Perspective

Feature Extraction, Construction and Selection: A Data Mining Perspective
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms

Data Mining and Knowledge Discovery
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Mathematical Programming for Data Mining: Formulations and Challenges

INFORMS Journal on Computing
Nested Partitions Method for Global Optimization

Operations Research
Intelligent Partitioning for Feature Selection

INFORMS Journal on Computing

Feature selection in bankruptcy prediction

Knowledge-Based Systems
A filter model for feature subset selection based on genetic algorithm

Knowledge-Based Systems
Variable selection by association rules for customer churn prediction of multimedia on demand

Expert Systems with Applications: An International Journal
Cognitive styles and web-based instruction: field dependent/independent vs. Holist/Serialist

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Feature selection for time series prediction - A combined filter and wrapper approach for neural networks

Neurocomputing
Nonlinear fuzzy robust PCA algorithms and similarity classifier in bankruptcy analysis

Expert Systems with Applications: An International Journal
An efficient preprocessing stage for the relationship-based clustering framework

Intelligent Data Analysis
A GRASP method for building classification trees

Expert Systems with Applications: An International Journal
MCELCCh-FDP: Financial distress prediction with classifier ensembles based on firm life cycle and Choquet integral

Expert Systems with Applications: An International Journal
Diagnosing Breast Masses in Digital Mammography Using Feature Selection and Ensemble Methods

Journal of Medical Systems
Determinants of intangible assets value: The data mining approach

Knowledge-Based Systems
Scalable improved quick reduct: sample based

RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
Genetic algorithms in feature and instance selection

Knowledge-Based Systems
Effects-based feature identification for network intrusion detection

Neurocomputing

Quantified Score

Hi-index	0.03

Visualization

Abstract

Preprocessing the data to filter out redundant and irrelevant features is one of the most important steps in the data mining process. Careful feature selection may improve both the computational time of inducing subsequent models and the quality of those models. Using fewer features often leads to simpler and easier to interpret models, and selecting important feature can lead to important insights into the application. The feature selection problem is inherently a combinatorial optimization problem. This paper builds on a metaheuristic called the nested partitions method that has been shown to be particularly effective for the feature selection problem. Specifically, we focus on the scalability of the method and show that its performance is vastly improved by incorporating random sampling of instances. Furthermore, we develop an adaptive variant of the algorithm that dynamically determines the required sample rate. The adaptive algorithm is shown to perform very well when applied to a set of standard test problems.