Balancing strategies and class overlapping

  • Authors:
  • Gustavo E. A. P. A. Batista;Ronaldo C. Prati;Maria C. Monard

  • Affiliations:
  • Institute of Mathematics and Computer Science at University of São Paulo, São Carlos (SP), Brazil;Institute of Mathematics and Computer Science at University of São Paulo, São Carlos (SP), Brazil;Institute of Mathematics and Computer Science at University of São Paulo, São Carlos (SP), Brazil

  • Venue:
  • IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several studies have pointed out that class imbalance is a bottleneck in the performance achieved by standard supervised learning systems. However, a complete understanding of how this problem affects the performance of learning is still lacking. In previous work we identified that performance degradation is not solely caused by class imbalances, but is also related to the degree of class overlapping. In this work, we conduct our research a step further by investigating sampling strategies which aim to balance the training set. Our results show that these sampling strategies usually lead to a performance improvement for highly imbalanced data sets having highly overlapped classes. In addition, over-sampling methods seem to outperform under-sampling methods.