Evolutionary-based selection of generalized instances for imbalanced classification

Authors:
Salvador Garcıa;Joaquın Derrac;Isaac Triguero;Cristóbal J. Carmona;Francisco Herrera
Affiliations:
University of Jaén, Department of Computer Science, 23071 Jaén, Spain;University of Granada, Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), 18071 Granada, Spain;University of Granada, Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), 18071 Granada, Spain;University of Jaén, Department of Computer Science, 23071 Jaén, Spain;University of Granada, Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), 18071 Granada, Spain
Venue:
Knowledge-Based Systems
Year:
2012

Citing 40
Cited 11

Instance-Based Learning Algorithms

Machine Learning
A Nearest Hyperrectangle Learning Method

Machine Learning
An Experimental Comparison of the Nearest-Neighbor and Nearest-Hyperrectangle Algorithms

Machine Learning
Unifying instance-based and rule-based induction

Machine Learning
Data preparation for data mining

Data preparation for data mining
Separate-and-Conquer Rule Learning

Artificial Intelligence Review
Data Mining and Knowledge Discovery with Evolutionary Algorithms

Data Mining and Knowledge Discovery with Evolutionary Algorithms
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Introduction to Evolutionary Computing

Introduction to Evolutionary Computing
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
2008 Special Issue: Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance

Neural Networks
A memetic algorithm for evolutionary prototype selection: A scaling up approach

Pattern Recognition
Automatically countering imbalance and its empirical relationship to cost

Data Mining and Knowledge Discovery
Local distance-based classification

Knowledge-Based Systems
Evolutionary rule-based systems for imbalanced data sets

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
KEEL: a software tool to assess evolutionary algorithms for data mining problems

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
Handbook of Parametric and Nonparametric Statistical Procedures

Handbook of Parametric and Nonparametric Statistical Procedures
A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability

Soft Computing - A Fusion of Foundations, Methodologies and Applications
Machine Learning and Data Mining: Introduction to Principles and Algorithms

Machine Learning and Data Mining: Introduction to Principles and Algorithms
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems

Applied Soft Computing
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy

Evolutionary Computation
A survey of evolutionary algorithms for clustering

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets

Information Sciences: an International Journal
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule

Pattern Recognition
Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power

Information Sciences: an International Journal
Cost-sensitive classification with respect to waiting cost

Knowledge-Based Systems
Differential Evolution for learning the classification method PROAFTN

Knowledge-Based Systems
A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm

Knowledge-Based Systems
Mining associative classification rules with stock trading data - A GA-based method

Knowledge-Based Systems
An effective feature selection method for hyperspectral image classification based on genetic algorithm and support vector machine

Knowledge-Based Systems
Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification

Pattern Recognition
Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study

IEEE Transactions on Evolutionary Computation
IPADE: iterative prototype adjustment for nearest neighbor classification

IEEE Transactions on Neural Networks
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study

IEEE Transactions on Evolutionary Computation
Nearest neighbor pattern classification

IEEE Transactions on Information Theory

On the effectiveness of preprocessing methods when dealing with different levels of class imbalance

Knowledge-Based Systems
Prediction of flavin mono-nucleotide binding sites using modified PSSM profile and ensemble support vector machine

Computers in Biology and Medicine
A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets

Knowledge-Based Systems
Multiple extreme learning machines for a two-class imbalance corporate life cycle prediction

Knowledge-Based Systems
Genetic algorithms in feature and instance selection

Knowledge-Based Systems
Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods

Knowledge-Based Systems
Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches

Knowledge-Based Systems
Fast instance selection for speeding up support vector machines

Knowledge-Based Systems
Training and assessing classification rules with imbalanced data

Data Mining and Knowledge Discovery
Addressing imbalanced classification with instance generation techniques: IPADE-ID

Neurocomputing
Predicting Protein-Ligand Binding Site Using Support Vector Machine with Protein Properties

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In supervised classification, we often encounter many real world problems in which the data do not have an equitable distribution among the different classes of the problem. In such cases, we are dealing with the so-called imbalanced data sets. One of the most used techniques to deal with this problem consists of preprocessing the data previously to the learning process. This paper proposes a method belonging to the family of the nested generalized exemplar that accomplishes learning by storing objects in Euclidean n-space. Classification of new data is performed by computing their distance to the nearest generalized exemplar. The method is optimized by the selection of the most suitable generalized exemplars based on evolutionary algorithms. An experimental analysis is carried out over a wide range of highly imbalanced data sets and uses the statistical tests suggested in the specialized literature. The results obtained show that our evolutionary proposal outperforms other classic and recent models in accuracy and requires to store a lower number of generalized examples.