A scalable approach to simultaneous evolutionary instance and feature selection

Authors:
NicoláS GarcíA-Pedrajas;Aida De Haro-GarcíA;Javier PéRez-RodríGuez
Affiliations:
Department of Computing and Numerical Analysis, Edificio Einstein, 3a Planta, University of Córdoba, Campus de Rabanales, 14071 Córdoba, Spain;Department of Computing and Numerical Analysis, Edificio Einstein, 3a Planta, University of Córdoba, Campus de Rabanales, 14071 Córdoba, Spain;Department of Computing and Numerical Analysis, Edificio Einstein, 3a Planta, University of Córdoba, Campus de Rabanales, 14071 Córdoba, Spain
Venue:
Information Sciences: an International Journal
Year:
2013

Citing 42
Cited 0

Instance-Based Learning Algorithms

Machine Learning
Recursive Automatic Bias Selection for Classifier Construction

Machine Learning - Special issue on bias evaluation and selection
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Nearest neighbor classifier: simultaneous editing and feature selection

Pattern Recognition Letters - Special issue on pattern recognition in practice VI
Multidimensional binary search trees used for associative searching

Communications of the ACM
Adaptive Intrusion Detection: A Data Mining Approach

Artificial Intelligence Review - Issues on the application of data mining
Customer Retention via Data Mining

Artificial Intelligence Review - Issues on the application of data mining
Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
On Issues of Instance Selection

Data Mining and Knowledge Discovery
Advances in Instance Selection for Instance-Based Learning Algorithms

Data Mining and Knowledge Discovery
Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Machine Learning
Data Mining for the Corporate Masses?

Computer
Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy

Machine Learning
Feature selection for high-dimensional genomic microarray data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Prototype Selection and Feature Subset Selection by Estimation of Distribution Algorithms. A Case Study in the Survival of Cirrhotic Patients Treated with TIPS

AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
Feature Selection for Clustering - A Filter Solution

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Theoretical and Empirical Analysis of ReliefF and RReliefF

Machine Learning
Learning Ensembles from Bites: A Scalable and Accurate Approach

The Journal of Machine Learning Research
Stratification for scaling up evolutionary prototype selection

Pattern Recognition Letters
Instance Selection and Feature Weighting Using Evolutionary Algorithms

CIC '06 Proceedings of the 15th International Conference on Computing
Incremental wrapper-based gene selection from microarray data for cancer classification

Pattern Recognition
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Nonlinear Boosting Projections for Ensemble Construction

The Journal of Machine Learning Research
Translation initiation site prediction on a genomic scale

Bioinformatics
A memetic algorithm for evolutionary prototype selection: A scaling up approach

Pattern Recognition
A divide-and-conquer recursive approach for scaling up instance selection algorithms

Data Mining and Knowledge Discovery
Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy

Evolutionary Computation
Design of nearest neighbor classifiers: multi-objective approach

International Journal of Approximate Reasoning
A cooperative coevolutionary algorithm for instance selection for instance-based learning

Machine Learning
Democratic instance selection: A linear complexity instance selection algorithm based on classifier ensemble concepts

Artificial Intelligence
IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule

Pattern Recognition
A Parameter-Free Classification Method for Large Scale Learning

The Journal of Machine Learning Research
Simultaneous feature selection and classification using kernel-penalized support vector machines

Information Sciences: an International Journal
Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection

Information Sciences: an International Journal
Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study

IEEE Transactions on Pattern Analysis and Machine Intelligence
An orthogonal genetic algorithm with quantization for globalnumerical optimization

IEEE Transactions on Evolutionary Computation
Performance assessment of multiobjective optimizers: an analysis and review

IEEE Transactions on Evolutionary Computation
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study

IEEE Transactions on Evolutionary Computation
Fast generic selection of features for neural network classifiers

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.07

Visualization

Abstract

An enormous amount of information is continually being produced in current research, which poses a challenge for data mining algorithms. Many of the problems in extremely active research areas, such as bioinformatics, security and intrusion detection and text mining, involve large or enormous datasets. These datasets pose serious problems for many data mining algorithms. One method to address very large datasets is data reduction. Among the most useful data reduction methods is simultaneous instance and feature selection. This method achieves a considerable reduction in the training data while maintaining, or even improving, the performance of the data-mining algorithm. However, it suffers from a high degree of scalability problems, even for medium-sized datasets. In this paper, we propose a new evolutionary simultaneous instance and feature selection algorithm that is scalable to millions of instances and thousands of features. This proposal is based on the divide-and-conquer principle combined with bookkeeping. The divide-and-conquer principle allows the execution of the algorithm in linear time. Furthermore, the proposed method is easy to implement using a parallel environment and can work without loading the entire dataset into memory. Using 50 medium-sized datasets, we will demonstrate our method's ability to match the results of state-of-the-art instance and feature selection methods while significantly reducing the time requirements. Using 13 very large datasets, we will demonstrate the scalability of our proposal to millions of instances and thousands of features.