Impact of noise on credit risk prediction: Does data quality really matter?

  • Authors:
  • Bhekisipho Twala

  • Affiliations:
  • Department of Electrical and Electronic Engineering Science, University of Johannesburg, P O Box 524, Auckland Park, Johannesburg 2006, South Africa. E-mail: btwala@uj.ac.za

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Machine learning has been successfully used for credit-evaluation decisions. Most research on machine learning assumes that the attributes of training and tests instances are not only completely specified but are also free from noise. Real world data, however, often suffer from corruptions or noise but not always known. This is the heart of information-based credit risk models. However, blindly applying such machine learning techniques to noisy financial credit risk evaluation data may fail to make very good or perfect predictions. Unfortunately, despite extensive research over the last decades, the impact of poor quality of data especially noise on the accuracy of credit risk has attracted less attention, even though it remains a significant problem for many. This paper investigates the robustness of five machine learning supervised algorithms to noisy credit risk environment. In particular, we show that when noise is added to four real-world credit risk domains, a significant and disproportionate number of total errors are contributed by class noise compared to attribute noise; thus, in the presence of noise, it is noise on the class variable that are responsible for the poor predictive accuracy of the learning concept.