Missing values imputation for a clustering genetic algorithm

  • Authors:
  • Eduardo R. Hruschka;Estevam R. Hruschka;Nelson F. F. Ebecken

  • Affiliations:
  • Catholic University of Santos (UniSantos), Santos, SP, Brazil;Federal University of São Carlos, São Carlos, SP, Brazil;COPPE / Federal University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil

  • Venue:
  • ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part III
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The substitution of missing values, also called imputation, is an important data preparation task for data mining applications. This paper describes a nearest-neighbor method to impute missing values, showing that it can be useful for a clustering genetic algorithm. The proposed nearest-neighbor method is assessed by means of simulations performed in two datasets that are benchmarks for data mining methods: Wisconsin Breast Cancer and Congressional Voting Records. The efficacy of the proposed approach is evaluated both in prediction and clustering scenarios. Empirical results show that the employed imputation method is a suitable data preparation tool.