On the choice of the best imputation methods for missing values considering three groups of classification methods

  • Authors:
  • Julián Luengo;Salvador García;Francisco Herrera

  • Affiliations:
  • CITIC-University of Granada, Department of Computer Science and Artificial Intelligence, 18071, Granada, Spain;University of Jaén, Dept. of Computer Science, 23071, Jaén, Spain;CITIC-University of Granada, Department of Computer Science and Artificial Intelligence, 18071, Granada, Spain

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In real-life data, information is frequently lost in data mining, caused by the presence of missing values in attributes. Several schemes have been studied to overcome the drawbacks produced by missing values in data mining tasks; one of the most well known is based on preprocessing, formerly known as imputation. In this work, we focus on a classification task with twenty-three classification methods and fourteen different imputation approaches to missing values treatment that are presented and analyzed. The analysis involves a group-based approach, in which we distinguish between three different categories of classification methods. Each category behaves differently, and the evidence obtained shows that the use of determined missing values imputation methods could improve the accuracy obtained for these methods. In this study, the convenience of using imputation methods for preprocessing data sets with missing values is stated. The analysis suggests that the use of particular imputation methods conditioned to the groups is required.