The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients

  • Authors:
  • I-Cheng Yeh;Che-hui Lien

  • Affiliations:
  • Department of Information Management, Chung-Hua University, Hsin Chu 30067, Taiwan, ROC;Department of Management, Thompson Rivers University, Kamloops, BC, Canada

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 12.06

Visualization

Abstract

This research aimed at the case of customers' default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel ''Sorting Smoothing Method'' to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y=A+BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default.