Impact of noise on credit risk prediction: Does data quality really matter?

Authors:
Bhekisipho Twala
Affiliations:
Department of Electrical and Electronic Engineering Science, University of Johannesburg, P O Box 524, Auckland Park, Johannesburg 2006, South Africa. E-mail: btwala@uj.ac.za
Venue:
Intelligent Data Analysis
Year:
2013

Citing 24
Cited 0

Statistical analysis with missing data

Statistical analysis with missing data
Simplifying decision trees

International Journal of Man-Machine Studies - Special Issue: Knowledge Acquisition for Knowledge-based Systems. Part 5
Inducing rules for expert system development: an example using default and bankruptcy data

Management Science
Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Instance-Based Learning Algorithms

Machine Learning
Artificial intelligence (3rd ed.)

Artificial intelligence (3rd ed.)
C4.5: programs for machine learning

C4.5: programs for machine learning
Managerial applications of neural networks: the case of bank failure predictions

Management Science
Neural network credit scoring models

Computers and Operations Research - Neural networks in business
Machine Learning

Machine Learning
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Imputation of Missing Data in Industrial Databases

Applied Intelligence
Inductive Learning for Risk Classification

IEEE Expert: Intelligent Systems and Their Applications
Induction of Decision Trees

Machine Learning
Choosing k for two-class nearest neighbour classifiers with unbalanced classes

Pattern Recognition Letters
Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation

Management Science
Class noise vs. attribute noise: a quantitative study of their impacts

Artificial Intelligence Review
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Investigating the Performance of Naive- Bayes Classifiers and K- Nearest Neighbor Classifiers

ICCIT '07 Proceedings of the 2007 International Conference on Convergence Information Technology
Good methods for coping with missing data in decision trees

Pattern Recognition Letters
Complex concept acquisition through directed search and feature caching

IJCAI'93 Proceedings of the 13th international joint conference on Artifical intelligence - Volume 2
Noise and knowledge acquisition

IJCAI'87 Proceedings of the 10th international joint conference on Artificial intelligence - Volume 1
Identifying and eliminating mislabeled training instances

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Induction of selective Bayesian classifiers

UAI'94 Proceedings of the Tenth international conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine learning has been successfully used for credit-evaluation decisions. Most research on machine learning assumes that the attributes of training and tests instances are not only completely specified but are also free from noise. Real world data, however, often suffer from corruptions or noise but not always known. This is the heart of information-based credit risk models. However, blindly applying such machine learning techniques to noisy financial credit risk evaluation data may fail to make very good or perfect predictions. Unfortunately, despite extensive research over the last decades, the impact of poor quality of data especially noise on the accuracy of credit risk has attracted less attention, even though it remains a significant problem for many. This paper investigates the robustness of five machine learning supervised algorithms to noisy credit risk environment. In particular, we show that when noise is added to four real-world credit risk domains, a significant and disproportionate number of total errors are contributed by class noise compared to attribute noise; thus, in the presence of noise, it is noise on the class variable that are responsible for the poor predictive accuracy of the learning concept.