The application of brute force logistic regression to corporate credit scoring models: Evidence from Serbian financial statements

  • Authors:
  • Nebojsa Nikolic;Nevenka Zarkic-Joksimovic;Djordje Stojanovski;Iva Joksimovic

  • Affiliations:
  • -;-;-;-

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 12.05

Visualization

Abstract

In this paper a brute force logistic regression (LR) modeling approach is proposed and used to develop predictive credit scoring model for corporate entities. The modeling is based on 5years of data from end-of-year financial statements of Serbian corporate entities, as well as, default event data. To the best of our knowledge, so far no relevant research about predictive power of financial ratios derived from Serbian financial statements has been published. This is also the first paper that generated 350 financial ratios to represent independent variables for 7590 corporate entities default predictions'. Many of derived financial ratios are new and were not discussed in literature before. Weight of evidence (WOE) method has been applied to transform and prepare financial ratios for brute force LR fitting simulations. Clustering method has been utilized to reduce long list of variables and to remove highly correlated financial ratios from partitioned training and validation datasets. The clustering results have revealed that number of variables can be reduced to short list of 24 financial ratios which are then analyzed in terms of default event predictive power. In this paper we propose the most predictive financial ratios from financial statements of Serbian corporate entities. The obtained short list of financial ratios has been used as a main input for brute force LR model simulations. According to literature, common practice to select variables in final model is to run stepwise, forward or backward LR. However, this research has been conducted in a way that the brute force LR simulations have to obtain all possible combinations of models that comprise of 5-14 independent variables from the short list of 24 financial ratios. The total number of simulated resulting LR models is around 14 million. Each model has been fitted through extensive and time consuming brute force LR simulations using SAS(R) code written by the authors. The total number of 342,016 simulated models (''well-founded'' models) has satisfied the established credit scoring model validity conditions. The well-founded models have been ranked according to GINI performance on validation dataset. After all well-founded models have been ranked, the model with highest predictive power and consisting of 8 financial ratios has been selected and analyzed in terms of receiver-operating characteristic curve (ROC), GINI, AIC, SC, LR fitting statistics and correlation coefficients. The financial ratio constituents of that model have been discussed and benchmarked with several models from relevant literature.