Partial imputation of unseen records to improve classification using a hybrid multi-layered artificial immune system and genetic algorithm

Authors:
Mlungisi Duma;Tshilidzi Marwala;Bhekisipho Twala;Fulufhelo Nelwamondo
Affiliations:
-;-;-;-
Venue:
Applied Soft Computing
Year:
2013

Citing 25
Cited 0

Statistical analysis with missing data

Statistical analysis with missing data
C4.5: programs for machine learning

C4.5: programs for machine learning
Genetic algorithms + data structures = evolution programs (3rd ed.)

Genetic algorithms + data structures = evolution programs (3rd ed.)
Self-Nonself Discrimination in a Computer

SP '94 Proceedings of the 1994 IEEE Symposium on Security and Privacy
A Case Study of Applying Boosting Naive Bayes to Claim Fraud Diagnosis

IEEE Transactions on Knowledge and Data Engineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Making generative classifiers robust to selection bias

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Classification Techniques of Neural Networks Using Improved Genetic Algorithms

WGEC '08 Proceedings of the 2008 Second International Conference on Genetic and Evolutionary Computing
A Comparative Study of Classification Methods in Financial Risk Detection

NCM '08 Proceedings of the 2008 Fourth International Conference on Networked Computing and Advanced Information Management - Volume 02
Probability density estimation for survival data with censoring indicators missing at random

Journal of Multivariate Analysis
AN EMPIRICAL COMPARISON OF TECHNIQUES FOR HANDLING INCOMPLETE DATA USING DECISION TREES

Applied Artificial Intelligence
Computational Intelligence for Missing Data Imputation, Estimation, and Management: Knowledge Optimization Techniques

Computational Intelligence for Missing Data Imputation, Estimation, and Management: Knowledge Optimization Techniques
Nearest neighbours in least-squares data imputation algorithms with different missing patterns

Computational Statistics & Data Analysis
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
PPCA-based missing data imputation for traffic flow volume: a systematical approach

IEEE Transactions on Intelligent Transportation Systems
Tree-Based Approach to Missing Data Imputation

ICDMW '09 Proceedings of the 2009 IEEE International Conference on Data Mining Workshops
The hybrid credit scoring model based on KNN classifier

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Missing data imputation: a fuzzy K-means clustering algorithm over sliding window

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 3
Training and testing of recommender systems on data missing not at random

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Ensemble missing data techniques for software effort prediction

Intelligent Data Analysis
Missing value imputation on missing completely at random data using multilayer perceptrons

Neural Networks
Missing data imputation in multivariate data by evolutionary algorithms

Computers in Human Behavior
A robust missing value imputation method for noisy data

Applied Intelligence
Learning and optimization using the clonal selection principle

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Missing data in large insurance datasets affects the learning and classification accuracies in predictive modelling. Insurance datasets will continue to increase in size as more variables are added to aid in managing client risk and will therefore be even more vulnerable to missing data. This paper proposes a hybrid multi-layered artificial immune system and genetic algorithm for partial imputation of missing data in datasets with numerous variables. The multi-layered artificial immune system creates and stores antibodies that bind to and annihilate an antigen. The genetic algorithm optimises the learning process of a stimulated antibody. The evaluation of the imputation is performed using the RIPPER, k-nearest neighbour, naive Bayes and logistic discriminant classifiers. The effect of the imputation on the classifiers is compared with that of the mean/mode and hot deck imputation methods. The results demonstrate that when missing data imputation is performed using the proposed hybrid method, the classification improves and the robustness to the amount of missing data is increased relative to the mean/mode method for data missing completely at random (MCAR) missing at random (MAR), and not missing at random (NMAR).The imputation performance is similar to or marginally better than that of the hot deck imputation.