Missing values imputation for a clustering genetic algorithm

Authors:
Eduardo R. Hruschka;Estevam R. Hruschka;Nelson F. F. Ebecken
Affiliations:
Catholic University of Santos (UniSantos), Santos, SP, Brazil;Federal University of São Carlos, São Carlos, SP, Brazil;COPPE / Federal University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil
Venue:
ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part III
Year:
2005

Citing 9
Cited 0

Statistical analysis with missing data

Statistical analysis with missing data
From data mining to knowledge discovery: an overview

Advances in knowledge discovery and data mining
Data preparation for data mining

Data preparation for data mining
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Genetic Algorithms and Grouping Problems

Genetic Algorithms and Grouping Problems
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Machine Learning

Machine Learning
A genetic algorithm for cluster analysis

Intelligent Data Analysis
Towards efficient imputation by nearest-neighbors: a clustering-based approach

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The substitution of missing values, also called imputation, is an important data preparation task for data mining applications. This paper describes a nearest-neighbor method to impute missing values, showing that it can be useful for a clustering genetic algorithm. The proposed nearest-neighbor method is assessed by means of simulations performed in two datasets that are benchmarks for data mining methods: Wisconsin Breast Cancer and Congressional Voting Records. The efficacy of the proposed approach is evaluated both in prediction and clustering scenarios. Empirical results show that the employed imputation method is a suitable data preparation tool.