Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy

Authors:
Salvador García;Francisco Herrera
Affiliations:
Department of Computer Science and Artificial Intelligence, University of Granada, Granada, 18071, Spain. salvagl@decsai.ugr.es;Department of Computer Science and Artificial Intelligence, University of Granada, Granada, 18071, Spain. herrera@decsai.ugr.es
Venue:
Evolutionary Computation
Year:
2009

Citing 33
Cited 20

Instance-Based Learning Algorithms

Machine Learning
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Complexity Measures of Supervised Classification Problems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Design of an optimal nearest neighbor classifier using an intelligent genetic algorithm

Pattern Recognition Letters
Improving Identification of Difficult Small Classes by Balancing Class Distribution

AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
Accuracy-based learning classifier systems: models, analysis and applications to classification tasks

Evolutionary Computation
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Stratification for scaling up evolutionary prototype selection

Pattern Recognition Letters
An Unsupervised Learning Approach to Resolving the Data Imbalanced Issue in Supervised Learning Problems in Functional Genomics

HIS '05 Proceedings of the Fifth International Conference on Hybrid Intelligent Systems
A Study of Structural and Parametric Learning in XCS

Evolutionary Computation
A genetic method for designing TSK models based on objective weighting: application to classification problems.

Soft Computing - A Fusion of Foundations, Methodologies and Applications
Bounding XCS's parameters for unbalanced datasets

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Automated global structure extraction for effective local building block processing in XCS

Evolutionary Computation
Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability

Data & Knowledge Engineering
The effect of imbalanced data sets on LDA: A theoretical and empirical analysis

Pattern Recognition
Genetic learning of accurate and compact fuzzy rule based systems based on the 2-tuples linguistic representation

International Journal of Approximate Reasoning
Feature selection based on rough sets and particle swarm optimization

Pattern Recognition Letters
Natural language tagging with genetic algorithms

Information Processing Letters
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Handbook of Parametric and Nonparametric Statistical Procedures

Handbook of Parametric and Nonparametric Statistical Procedures
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Learning from imbalanced data in surveillance of nosocomial infection

Artificial Intelligence in Medicine
Feature-based image registration by means of the CHC evolutionary algorithm

Image and Vision Computing
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
A proposal of evolutionary prototype selection for class imbalance problems

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Nearest prototype classification: clustering, genetic algorithms, or random search?

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study

IEEE Transactions on Evolutionary Computation
Domain of competence of XCS classifier system in complexity measurement space

IEEE Transactions on Evolutionary Computation
Training genetic programming on half a million patterns: an example from anomaly detection

IEEE Transactions on Evolutionary Computation
Imbalanced learning with a biased minimax probability machine

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Facetwise analysis of XCS for problems with class imbalances

IEEE Transactions on Evolutionary Computation
Evolutionary selection of hyperrectangles in nested generalized exemplar learning

Applied Soft Computing
Class imbalance methods for translation initiation site recognition

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
Exploring the performance of resampling strategies for the class imbalance problem

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
A preliminary study on the selection of generalized instances for imbalanced classification

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
Classification of high dimensional and imbalanced hyperspectral imagery data

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Addressing the classification with imbalanced data: open problems and new challenges on class distribution

HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
Back propagation with balanced MSE cost function and nearest neighbor editing for handling class overlap and class imbalance

IWANN'11 Proceedings of the 11th international conference on Artificial neural networks conference on Advances in computational intelligence - Volume Part I
On the effectiveness of preprocessing methods when dealing with different levels of class imbalance

Knowledge-Based Systems
Evolutionary-based selection of generalized instances for imbalanced classification

Knowledge-Based Systems
Class imbalance methods for translation initiation site recognition in DNA sequences

Knowledge-Based Systems
Exploring synergetic effects of dimensionality reduction and resampling tools on hyperspectral imagery data classification

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Instance selection for class imbalanced problems by means of selecting instances more than once

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Evolutionary algorithms for the design of grid-connected PV-systems

Expert Systems with Applications: An International Journal
Editorial: Large scale instance selection by means of federal instance selection

Data & Knowledge Engineering
Multi-selection of instances: A straightforward way to improve evolutionary instance selection

Applied Soft Computing
A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios

Pattern Recognition Letters
A scalable approach to simultaneous evolutionary instance and feature selection

Information Sciences: an International Journal
EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling

Pattern Recognition
Addressing imbalanced classification with instance generation techniques: IPADE-ID

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning with imbalanced data is one of the recent challenges in machine learning. Various solutions have been proposed in order to find a treatment for this problem, such as modifying methods or the application of a preprocessing stage. Within the preprocessing focused on balancing data, two tendencies exist: reduce the set of examples (undersampling) or replicate minority class examples (oversampling). Undersampling with imbalanced datasets could be considered as a prototype selection procedure with the purpose of balancing datasets to achieve a high classification rate, avoiding the bias toward majority class examples. Evolutionary algorithms have been used for classical prototype selection showing good results, where the fitness function is associated to the classification and reduction rates. In this paper, we propose a set of methods called evolutionary undersampling that take into consideration the nature of the problem and use different fitness functions for getting a good trade-off between balance of distribution of classes and performance. The study includes a taxonomy of the approaches and an overall comparison among our models and state of the art undersampling methods. The results have been contrasted by using nonparametric statistical procedures and show that evolutionary undersampling outperforms the nonevolutionary models when the degree of imbalance is increased.