Improving risk predictions by preprocessing imbalanced credit data
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
Hi-index | 0.00 |
Managing customer credit is an important issue in the banking industry and should always be done in an automatic way, with credit scoring trusted. This paper presents our solution to PAKDD 2009 data mining competition as a case study of the credit scoring problem. Following a brief description of the data mining task, several challenges confronted in the task such as imbalanced dataset, missing values and data transformation are discussed. After series of preliminary experiments, logistic regression and AdaBoost were shown as the resulting classifiers on this particular problem. Furthermore, an ensemble of the two classifiers was created in order to achieve even better performance. The final result shows that our solution is effective and efficient with an AUC value of 0.6535, which was the fifth best result among more than 100 competitive teams.