Genetic algorithms in feature and instance selection

Authors:
Chih-Fong Tsai;William Eberle;Chi-Yuan Chu
Affiliations:
Department of Information Management, National Central University, Taiwan;Department of Computer Science, Tennessee Technological University, USA;Department of Information Management, National Central University, Taiwan
Venue:
Knowledge-Based Systems
Year:
2013

Citing 38
Cited 2

Optimization of control parameters for genetic algorithms

IEEE Transactions on Systems, Man and Cybernetics
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Nearest neighbor classifier: simultaneous editing and feature selection

Pattern Recognition Letters - Special issue on pattern recognition in practice VI
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Data mining: concepts and techniques

Data mining: concepts and techniques
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
On Issues of Instance Selection

Data Mining and Knowledge Discovery
A Unifying View on Instance Selection

Data Mining and Knowledge Discovery
Design of an optimal nearest neighbor classifier using an intelligent genetic algorithm

Pattern Recognition Letters
Integrating feature and instance selection for text classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An introduction to variable and feature selection

The Journal of Machine Learning Research
Optimization-based feature selection with adaptive instance sampling

Computers and Operations Research
Instance Selection and Feature Weighting Using Evolutionary Algorithms

CIC '06 Proceedings of the 15th International Conference on Computing
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)

Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
Fast mining of distance-based outliers in high-dimensional datasets

Data Mining and Knowledge Discovery
Hybrid genetic algorithm for dual selection

Pattern Analysis & Applications
Subspace based feature selection for pattern recognition

Information Sciences: an International Journal
A search space reduction methodology for data mining in large databases

Engineering Applications of Artificial Intelligence
Bankruptcy prediction modeling with hybrid case-based reasoning and genetic algorithms approach

Applied Soft Computing
Feature selection in bankruptcy prediction

Knowledge-Based Systems
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule

Pattern Recognition
Dynamic Adaboost learning with feature selection based on parallel genetic algorithm for image annotation

Knowledge-Based Systems
A survey on the application of genetic programming to classification

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A novel image retrieval model based on the most relevant features

Knowledge-Based Systems
An effective feature selection method for hyperspectral image classification based on genetic algorithm and support vector machine

Knowledge-Based Systems
Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification

Pattern Recognition
Evolutionary-based selection of generalized instances for imbalanced classification

Knowledge-Based Systems
hGA: Hybrid genetic algorithm in fuzzy rule-based classification systems for high-dimensional problems

Applied Soft Computing
Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study

IEEE Transactions on Pattern Analysis and Machine Intelligence
A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application

Applied Soft Computing
Determinants of intangible assets value: The data mining approach

Knowledge-Based Systems
Dimensionality reduction using genetic algorithms

IEEE Transactions on Evolutionary Computation
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study

IEEE Transactions on Evolutionary Computation
A Cluster Validity Measure With Outlier Detection for Support Vector Clustering

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Evolutionary feature selection via structure retention

Expert Systems with Applications: An International Journal
Feature Selection Using a Piecewise Linear Network

IEEE Transactions on Neural Networks
Multi-selection of instances: A straightforward way to improve evolutionary instance selection

Applied Soft Computing

Enhancing learning algorithms to support data with short sequence features by automated feature discovery

Knowledge-Based Systems
Evolutionary instance selection for text classification

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature selection and instance selection are two important data preprocessing steps in data mining, where the former is aimed at removing some irrelevant and/or redundant features from a given dataset and the latter at discarding the faulty data. Genetic algorithms have been widely used for these tasks in related studies. However, these two data preprocessing tasks are generally considered separately in literature. It is unknown what the performance differences would be when feature and instance selection and feature or instance selection are performed individually. Therefore, the aim of this study is to perform feature selection and instance selection based on genetic algorithms using different priorities to examine the classification performances over different domain datasets. The experimental results obtained from four small and large scale datasets containing various numbers of features and data samples show that performing both feature and instance selection usually make the classifiers (i.e., support vector machines and k-nearest neighbor) perform slightly poorer than feature selection or instance selection individually. However, while there is not a significant difference in classification accuracy between these different data preprocessing methods, the combination of feature and instance selection largely reduces the computational effort of training the classifiers, as opposed to performing feature and instance selection individually. Considering both classification effectiveness and efficiency, we demonstrate that performing feature selection first and instance selection second is the optimal solution for data preprocessing in data mining. Both SVM and k-NN classifiers provide similar classification accuracy to the baselines (i.e., those without data preprocessing). The decisions regarding which data preprocessing task to perform for different dataset scales are also discussed.