Towards efficient imputation by nearest-neighbors: a clustering-based approach

Authors:
Eduardo R. Hruschka;Estevam R. Hruschka;Nelson F. F. Ebecken
Affiliations:
Universidade Católica de Santos (UniSantos), Brasil;Universidade Federal de São Carlos (UFSCAR), Brasil;COPPE / Universidade Federal do Rio de Janeiro, Brasil
Venue:
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Year:
2004

Citing 7
Cited 7

Statistical analysis with missing data

Statistical analysis with missing data
Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
Data preparation for data mining

Data preparation for data mining
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Machine Learning

Machine Learning
Cluster Analysis

Cluster Analysis

On the influence of imputation in classification: practical issues

Journal of Experimental & Theoretical Artificial Intelligence
Missing data imputation using statistical and machine learning methods in a real breast cancer problem

Artificial Intelligence in Medicine
Missing values imputation for a clustering genetic algorithm

ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part III
Imputing missing values for mixed numeric and categorical attributes based on incomplete data hierarchical clustering

KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management
An adaptive hybrid and cluster-based model for speeding up the k-NN classifier

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
A simple noise-tolerant abstraction algorithm for fast k-NN classification

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
An experimental study on the use of nearest neighbor-based imputation algorithms for classification tasks

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes and evaluates a nearest-neighbor method to sub-stitute missing values in ordinal/continuous datasets In a nutshell, the K-Means clustering algorithm is applied in the complete dataset (without missing values) before the imputation process by nearest-neighbors takes place Then, the achieved cluster centroids are employed as training instances for the nearest-neighbor method The proposed method is more efficient than the traditional nearest-neighbor method, and simulations performed in three benchmark data-sets also indicate that it provides suitable imputations, both in terms of prediction and classification tasks.