Cluster-Based sampling approaches to imbalanced data distributions

  • Authors:
  • Show-Jane Yen;Yue-Shi Lee

  • Affiliations:
  • Department of Computer Science and Information Engineering, Ming Chuan University, Gwei Shan District, Taoyuan County, Taiwan;Department of Computer Science and Information Engineering, Ming Chuan University, Gwei Shan District, Taoyuan County, Taiwan

  • Venue:
  • DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

For classification problem, the training data will significantly influence the classification accuracy. When the data set is highly unbalanced, classification algorithms tend to degenerate by assigning all cases to the most common outcome. Hence, it is important to select the suitable training data for classification in the imbalanced class distribution problem. In this paper, we propose cluster-based under-sampling approaches for selecting the representative data as training data to improve the classification accuracy in the imbalanced class distribution environment. The basic classification algorithm of neural network model is considered. The experimental results show that our cluster-based under-sampling approaches outperform the other under-sampling techniques in the previous studies.