On the choice of the best imputation methods for missing values considering three groups of classification methods

Authors:
Julián Luengo;Salvador García;Francisco Herrera
Affiliations:
CITIC-University of Granada, Department of Computer Science and Artificial Intelligence, 18071, Granada, Spain;University of Jaén, Dept. of Computer Science, 23071, Jaén, Spain;CITIC-University of Granada, Department of Computer Science and Artificial Intelligence, 18071, Granada, Spain
Venue:
Knowledge and Information Systems
Year:
2012

Citing 0
Cited 3

Imprecise imputation as a tool for solving classification problems with mean values of unobserved features

Advances in Artificial Intelligence
Wastewater treatment plant performance prediction with support vector machines

ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects
Multi model transfer learning with RULES family

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In real-life data, information is frequently lost in data mining, caused by the presence of missing values in attributes. Several schemes have been studied to overcome the drawbacks produced by missing values in data mining tasks; one of the most well known is based on preprocessing, formerly known as imputation. In this work, we focus on a classification task with twenty-three classification methods and fourteen different imputation approaches to missing values treatment that are presented and analyzed. The analysis involves a group-based approach, in which we distinguish between three different categories of classification methods. Each category behaves differently, and the evidence obtained shows that the use of determined missing values imputation methods could improve the accuracy obtained for these methods. In this study, the convenience of using imputation methods for preprocessing data sets with missing values is stated. The analysis suggests that the use of particular imputation methods conditioned to the groups is required.