On the impact of disproportional samples in credit scoring models: An application to a Brazilian bank data

Authors:
Francisco Louzada;Paulo H. Ferreira-Silva;Carlos A. R. Diniz
Affiliations:
Universidade de São Paulo, SME-ICMC, São Carlos, Brazil;Universidade Federal de São Carlos, DEs, São Carlos, Brazil;Universidade Federal de São Carlos, DEs, São Carlos, Brazil
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 3
Cited 1

Bioinformatics: the machine learning approach

Bioinformatics: the machine learning approach
Credit Scoring and Its Applications

Credit Scoring and Its Applications
A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines

Expert Systems with Applications: An International Journal

The application of brute force logistic regression to corporate credit scoring models: Evidence from Serbian financial statements

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

Statistical methods have been widely employed to assess the capabilities of credit scoring classification models in order to reduce the risk of wrong decisions when granting credit facilities to clients. The predictive quality of a classification model can be evaluated based on measures such as sensitivity, specificity, predictive values, accuracy, correlation coefficients and information theoretical measures, such as relative entropy and mutual information. In this paper we analyze the performance of a naive logistic regression model (Hosmer & Lemeshow, 1989) and a logistic regression with state-dependent sample selection model (Cramer, 2004) applied to simulated data. Also, as a case study, the methodology is illustrated on a data set extracted from a Brazilian bank portfolio. Our simulation results so far revealed that there is no statistically significant difference in terms of predictive capacity between the naive logistic regression models and the logistic regression with state-dependent sample selection models. However, there is strong difference between the distributions of the estimated default probabilities from these two statistical modeling techniques, with the naive logistic regression models always underestimating such probabilities, particularly in the presence of balanced samples.