Balancing strategies and class overlapping

Authors:
Gustavo E. A. P. A. Batista;Ronaldo C. Prati;Maria C. Monard
Affiliations:
Institute of Mathematics and Computer Science at University of São Paulo, São Carlos (SP), Brazil;Institute of Mathematics and Computer Science at University of São Paulo, São Carlos (SP), Brazil;Institute of Mathematics and Computer Science at University of São Paulo, São Carlos (SP), Brazil
Venue:
IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Year:
2005

Citing 4
Cited 9

C4.5: programs for machine learning

C4.5: programs for machine learning
Tree Induction for Probability-Based Ranking

Machine Learning
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research

A New Performance Evaluation Method for Two-Class Imbalanced Problems

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
On using prototype reduction schemes to enhance the computation of volume-based inter-class overlap measures

Pattern Recognition
An empirical study of the behavior of classifiers on imbalanced and overlapped data sets

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
A fast computation of inter-class overlap measures using prototype reduction schemes

Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
CODE: a data complexity framework for imbalanced datasets

PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
Genetic algorithms as a pre processing strategy for imbalanced datasets

Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
Mining the hidden structure of inductive learning data sets

Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets

Data & Knowledge Engineering
A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several studies have pointed out that class imbalance is a bottleneck in the performance achieved by standard supervised learning systems. However, a complete understanding of how this problem affects the performance of learning is still lacking. In previous work we identified that performance degradation is not solely caused by class imbalances, but is also related to the degree of class overlapping. In this work, we conduct our research a step further by investigating sampling strategies which aim to balance the training set. Our results show that these sampling strategies usually lead to a performance improvement for highly imbalanced data sets having highly overlapped classes. In addition, over-sampling methods seem to outperform under-sampling methods.