From linear to non-linear kernel based classifiers for bankruptcy prediction

  • Authors:
  • Tony Van Gestel;Bart Baesens;David Martens

  • Affiliations:
  • Quantification and Pricing, Dexia Group, Belgium;Department of Decision Sciences & Information Management, K.U.Leuven, Belgium and Department of Business Administration and Public Management, University College Ghent, Ghent University, Belgium;Department of Decision Sciences & Information Management, K.U.Leuven, Belgium and University of Southampton, School of Management, UK

  • Venue:
  • Neurocomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.03

Visualization

Abstract

Bankruptcy prediction has been a topic of research for decades, both within the financial and the academic world. The implementations of international financial and accounting standards, such as Basel II and IFRS, as well as the recent credit crisis, have accentuated this topic even further. This paper describes both regularized and non-linear kernel variants of traditional discriminant analysis techniques, such as logistic regression, Fisher discriminant analysis (FDA) and quadratic discriminant analysis (QDA). Next to a systematic description of these variants, we contribute to the literature by introducing kernel QDA and providing a comprehensive benchmarking study of these classification techniques and their regularized and kernel versions for bankruptcy prediction using 10 real-life data sets. Performance is compared in terms of binary classification accuracy, relevant for evaluating yes/no credit decisions and in terms of classification accuracy, relevant for pricing differentiated credit granting. The results clearly indicate the significant improvement for kernel variants in both percentage correctly classified (PCC) test instances and area under the ROC curve (AUC), and indicate that bankruptcy problems are weakly non-linear. On average, the best performance is achieved by LSSVM, closely followed by kernel quadratic discriminant analysis. Given the high impact of small improvements in performance, we show the relevance and importance of considering kernel techniques within this setting. Further experiments with backwards input selection improve our results even further. Finally, we experimentally investigate the relative ranking of the different categories of variables: liquidity, solvency, profitability and various, and as such provide new insights into the relative importance of these categories for predicting financial distress.