Partial imputation of unseen records to improve classification using a hybrid multi-layered artificial immune system and genetic algorithm

  • Authors:
  • Mlungisi Duma;Tshilidzi Marwala;Bhekisipho Twala;Fulufhelo Nelwamondo

  • Affiliations:
  • -;-;-;-

  • Venue:
  • Applied Soft Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Missing data in large insurance datasets affects the learning and classification accuracies in predictive modelling. Insurance datasets will continue to increase in size as more variables are added to aid in managing client risk and will therefore be even more vulnerable to missing data. This paper proposes a hybrid multi-layered artificial immune system and genetic algorithm for partial imputation of missing data in datasets with numerous variables. The multi-layered artificial immune system creates and stores antibodies that bind to and annihilate an antigen. The genetic algorithm optimises the learning process of a stimulated antibody. The evaluation of the imputation is performed using the RIPPER, k-nearest neighbour, naive Bayes and logistic discriminant classifiers. The effect of the imputation on the classifiers is compared with that of the mean/mode and hot deck imputation methods. The results demonstrate that when missing data imputation is performed using the proposed hybrid method, the classification improves and the robustness to the amount of missing data is increased relative to the mean/mode method for data missing completely at random (MCAR) missing at random (MAR), and not missing at random (NMAR).The imputation performance is similar to or marginally better than that of the hot deck imputation.