A dynamic over-sampling procedure based on sensitivity for multi-class problems

Authors:
Francisco Fernández-Navarro;César Hervás-Martínez;Pedro Antonio Gutiérrez
Affiliations:
Department of Computer Science and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, 3rd Floor, 14071 Córdoba, Spain;Department of Computer Science and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, 3rd Floor, 14071 Córdoba, Spain;Department of Computer Science and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, 3rd Floor, 14071 Córdoba, Spain
Venue:
Pattern Recognition
Year:
2011

Citing 42
Cited 8

Multiple comparison procedures

Multiple comparison procedures
Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Learning and generalization in radial basis function networks

Neural Computation
Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms

Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms
An efficient method to construct a radial basis function neural network classifier

Neural Networks
Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Supervised versus unsupervised binary-learning by feedforward neural networks

Machine Learning
A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

Machine Learning
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
High-Order Pattern Discovery from Discrete-Valued Data

IEEE Transactions on Knowledge and Data Engineering
Improving Minority Class Prediction Using Case-Specific Feature Weights

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Lamarckian Evolution, The Baldwin Effect and Function Optimization

PPSN III Proceedings of the International Conference on Evolutionary Computation. The Third Conference on Parallel Problem Solving from Nature: Parallel Problem Solving from Nature
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Extreme re-balancing for SVMs: a case study

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Using AUC and Accuracy in Evaluating Learning Algorithms

IEEE Transactions on Knowledge and Data Engineering
Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem

IEEE Transactions on Knowledge and Data Engineering
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Multi-class ROC analysis from a multi-objective optimisation perspective

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Selecting features in microarray classification using ROC curves

Pattern Recognition
Multi-class pattern classification using neural networks

Pattern Recognition
Weighting fuzzy classification rules using receiver operating characteristics (ROC) analysis

Information Sciences: an International Journal
Experimental perspectives on learning from imbalanced data

Proceedings of the 24th international conference on Machine learning
The class imbalance problem: A systematic study

Intelligent Data Analysis
Cloud basis function neural network: A modified RBF network architecture for holistic facial expression recognition

Pattern Recognition
Do unbalanced data have a negative effect on LDA?

Pattern Recognition
Maximizing the area under the ROC curve by pairwise feature combination

Pattern Recognition
Regularization in the selection of radial basis function centers

Neural Computation
Evolutionary Training Set Selection to Optimize C4.5 in Imbalanced Problems

HIS '08 Proceedings of the 2008 8th International Conference on Hybrid Intelligent Systems
Evolutionary rule-based systems for imbalanced data sets

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
Evolutionary product-unit neural networks classifiers

Neurocomputing
Comparison of Four Performance Metrics for Evaluating Sampling Techniques for Low Quality Class-Imbalanced Data

ICMLA '08 Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications
Countering imbalanced datasets to improve adverse drug event predictive models in labor and delivery

Journal of Biomedical Informatics
On the influence of an adaptive inference system in fuzzy rule based classification systems for imbalanced data-sets

Expert Systems with Applications: An International Journal
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Online rare events detection

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Sensitivity versus accuracy in multiclass problems using memetic Pareto evolutionary neural networks

IEEE Transactions on Neural Networks
Memetic Pareto Evolutionary Artificial Neural Networks to determine growth/no-growth in predictive microbiology

Applied Soft Computing
Evolutionary q-Gaussian Radial Basis Function Neural Network to determine the microbial growth/no growth interface of Staphylococcus aureus

Applied Soft Computing
Hybridization of evolutionary algorithms and local search by means of a clustering method

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity

Expert Systems with Applications: An International Journal
Neuro-logistic Models Based on Evolutionary Generalized Radial Basis Function for the Microarray Gene Expression Classification Problem

Neural Processing Letters
Parameter estimation of q-Gaussian Radial Basis Functions Neural Networks with a Hybrid Algorithm for binary classification

Neurocomputing
Permanent disability classification by combining evolutionary Generalized Radial Basis Function and logistic regression methods

Expert Systems with Applications: An International Journal
Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches

Knowledge-Based Systems
Parameterization of point-cloud freeform surfaces using adaptive sequential learning RBFnetworks

Pattern Recognition
Data weighting method on the basis of binary encoded output to solve multi-class pattern classification problems

Expert Systems with Applications: An International Journal
Imbalanced data classification using second-order cone programming support vector machines

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

Classification with imbalanced datasets supposes a new challenge for researches in the framework of machine learning. This problem appears when the number of patterns that represents one of the classes of the dataset (usually the concept of interest) is much lower than in the remaining classes. Thus, the learning model must be adapted to this situation, which is very common in real applications. In this paper, a dynamic over-sampling procedure is proposed for improving the classification of imbalanced datasets with more than two classes. This procedure is incorporated into a memetic algorithm (MA) that optimizes radial basis functions neural networks (RBFNNs). To handle class imbalance, the training data are resampled in two stages. In the first stage, an over-sampling procedure is applied to the minority class to balance in part the size of the classes. Then, the MA is run and the data are over-sampled in different generations of the evolution, generating new patterns of the minimum sensitivity class (the class with the worst accuracy for the best RBFNN of the population). The methodology proposed is tested using 13 imbalanced benchmark classification datasets from well-known machine learning problems and one complex problem of microbial growth. It is compared to other neural network methods specifically designed for handling imbalanced data. These methods include different over-sampling procedures in the preprocessing stage, a threshold-moving method where the output threshold is moved toward inexpensive classes and ensembles approaches combining the models obtained with these techniques. The results show that our proposal is able to improve the sensitivity in the generalization set and obtains both a high accuracy level and a good classification level for each class.