Optimization of control parameters for genetic algorithms
IEEE Transactions on Systems, Man and Cybernetics
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Nearest neighbor classifier: simultaneous editing and feature selection
Pattern Recognition Letters - Special issue on pattern recognition in practice VI
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
Data mining: concepts and techniques
Data mining: concepts and techniques
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
On Issues of Instance Selection
Data Mining and Knowledge Discovery
A Unifying View on Instance Selection
Data Mining and Knowledge Discovery
Design of an optimal nearest neighbor classifier using an intelligent genetic algorithm
Pattern Recognition Letters
Integrating feature and instance selection for text classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An introduction to variable and feature selection
The Journal of Machine Learning Research
Optimization-based feature selection with adaptive instance sampling
Computers and Operations Research
Instance Selection and Feature Weighting Using Evolutionary Algorithms
CIC '06 Proceedings of the 15th International Conference on Computing
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
Fast mining of distance-based outliers in high-dimensional datasets
Data Mining and Knowledge Discovery
Hybrid genetic algorithm for dual selection
Pattern Analysis & Applications
Subspace based feature selection for pattern recognition
Information Sciences: an International Journal
A search space reduction methodology for data mining in large databases
Engineering Applications of Artificial Intelligence
Feature selection in bankruptcy prediction
Knowledge-Based Systems
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
A survey on the application of genetic programming to classification
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A novel image retrieval model based on the most relevant features
Knowledge-Based Systems
Evolutionary-based selection of generalized instances for imbalanced classification
Knowledge-Based Systems
Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study
IEEE Transactions on Pattern Analysis and Machine Intelligence
Determinants of intangible assets value: The data mining approach
Knowledge-Based Systems
Dimensionality reduction using genetic algorithms
IEEE Transactions on Evolutionary Computation
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study
IEEE Transactions on Evolutionary Computation
A Cluster Validity Measure With Outlier Detection for Support Vector Clustering
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Evolutionary feature selection via structure retention
Expert Systems with Applications: An International Journal
Feature Selection Using a Piecewise Linear Network
IEEE Transactions on Neural Networks
Evolutionary instance selection for text classification
Journal of Systems and Software
Hi-index | 0.00 |
Feature selection and instance selection are two important data preprocessing steps in data mining, where the former is aimed at removing some irrelevant and/or redundant features from a given dataset and the latter at discarding the faulty data. Genetic algorithms have been widely used for these tasks in related studies. However, these two data preprocessing tasks are generally considered separately in literature. It is unknown what the performance differences would be when feature and instance selection and feature or instance selection are performed individually. Therefore, the aim of this study is to perform feature selection and instance selection based on genetic algorithms using different priorities to examine the classification performances over different domain datasets. The experimental results obtained from four small and large scale datasets containing various numbers of features and data samples show that performing both feature and instance selection usually make the classifiers (i.e., support vector machines and k-nearest neighbor) perform slightly poorer than feature selection or instance selection individually. However, while there is not a significant difference in classification accuracy between these different data preprocessing methods, the combination of feature and instance selection largely reduces the computational effort of training the classifiers, as opposed to performing feature and instance selection individually. Considering both classification effectiveness and efficiency, we demonstrate that performing feature selection first and instance selection second is the optimal solution for data preprocessing in data mining. Both SVM and k-NN classifiers provide similar classification accuracy to the baselines (i.e., those without data preprocessing). The decisions regarding which data preprocessing task to perform for different dataset scales are also discussed.