A multi-objective optimisation approach for class imbalance learning

Authors:
Paolo Soda
Affiliations:
Medical Informatics and Computer Science Laboratory, Integrated Research Centre, University Campus Bio-Medico of Rome, Via Alvaro del Portillo, 21, 00128 Roma, Italy
Venue:
Pattern Recognition
Year:
2011

Citing 15
Cited 5

Combination of Multiple Classifiers Using Local Accuracy Estimates

IEEE Transactions on Pattern Analysis and Machine Intelligence
On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Applying One-Sided Selection to Unbalanced Datasets

MICAI '00 Proceedings of the Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
Support Vector Machines with Embedded Reject Option

SVM '02 Proceedings of the First International Workshop on Pattern Recognition with Support Vector Machines
A Recognition-Based Alternative to Discrimination-Based Multi-layer Perceptrons

AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Nonlinear programming: a historical view

ACM SIGMAP Bulletin
Facing Imbalanced Classes through Aggregation of Classifiers

ICIAP '07 Proceedings of the 14th International Conference on Image Analysis and Processing
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Exploratory undersampling for class-imbalance learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
MSMOTE: Improving Classification Performance When Training Data is Imbalanced

IWCSE '09 Proceedings of the 2009 Second International Workshop on Computer Science and Engineering - Volume 02
On rejecting unreliably classified patterns

MCS'07 Proceedings of the 7th international conference on Multiple classifier systems

Mitotic HEp-2 cells recognition under class skew

ICIAP'11 Proceedings of the 16th international conference on Image analysis and processing - Volume Part II
A double-ensemble approach for classifying skewed data streams

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Polichotomies on imbalanced domains by one-per-class compensated reconstruction rule

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches

Knowledge-Based Systems
Analysis and design of rank-based classifiers

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

Class imbalance limits the performance of most learning algorithms since they cannot cope with large differences between the number of samples in each class, resulting in a low predictive accuracy over the minority class. In this respect, several papers proposed algorithms aiming at achieving more balanced performance. However, balancing the recognition accuracies for each class very often harms the global accuracy. Indeed, in these cases the accuracy over the minority class increases while the accuracy over the majority one decreases. This paper proposes an approach to overcome this limitation: for each classification act, it chooses between the output of a classifier trained on the original skewed distribution and the output of a classifier trained according to a learning method addressing the course of imbalanced data. This choice is driven by a parameter whose value maximizes, on a validation set, two objective functions, i.e. the global accuracy and the accuracies for each class. A series of experiments on ten public datasets with different proportions between the majority and minority classes show that the proposed approach provides more balanced recognition accuracies than classifiers trained according to traditional learning methods for imbalanced data as well as larger global accuracy than classifiers trained on the original skewed distribution.