Towards efficient imputation by nearest-neighbors: a clustering-based approach

  • Authors:
  • Eduardo R. Hruschka;Estevam R. Hruschka;Nelson F. F. Ebecken

  • Affiliations:
  • Universidade Católica de Santos (UniSantos), Brasil;Universidade Federal de São Carlos (UFSCAR), Brasil;COPPE / Universidade Federal do Rio de Janeiro, Brasil

  • Venue:
  • AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes and evaluates a nearest-neighbor method to sub-stitute missing values in ordinal/continuous datasets In a nutshell, the K-Means clustering algorithm is applied in the complete dataset (without missing values) before the imputation process by nearest-neighbors takes place Then, the achieved cluster centroids are employed as training instances for the nearest-neighbor method The proposed method is more efficient than the traditional nearest-neighbor method, and simulations performed in three benchmark data-sets also indicate that it provides suitable imputations, both in terms of prediction and classification tasks.