Influence of class distribution on cost-sensitive learning: A case study of bankruptcy analysis

  • Authors:
  • Ning Chen;An Chen;Bernardete Ribeiro

  • Affiliations:
  • GECAD, Instituto Superior de Engenharia do Porto, Porto, Portugal;GECAD, Instituto Superior de Engenharia do Porto, Porto, Portugal and Institute of Policy and Management, Chinese Academy of Sciences, Beijing, China;CISUC, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Skewed class distribution and non-uniform misclassification cost are pervasive in many real-world domains such as bankruptcy prediction, medical diagnosis, and intrusion detection. Although class imbalance learning and cost-sensitive learning can be manipulated in a unified framework as was illustrated in previous studies, the influence of class distribution on cost-sensitive learning still needs clarification. In this paper, we investigate the effect of cost ratio, imbalance ratio and sample size on classification performance using a real-world French bankruptcy database. The results show that the cost ratio and the level of class imbalance have strong effect on prediction performance. A near-balanced training data set is favorable when a relatively uniform cost ratio is used, whereas a near-natural class distribution is favorable when a highly uneven cost ratio is used.