Nearest neighbor selection for iteratively kNN imputation

Authors:
Shichao Zhang
Affiliations:
College of Computer Science and Information Technology, Guangxi Normal University, Guilin, China and Institute of Computing Technology, The Chinese Academy of Sciences, Beijing, China and QUIS, Fa ...
Venue:
Journal of Systems and Software
Year:
2012

Citing 15
Cited 1

Unknown attribute values in induction

Proceedings of the sixth international workshop on Machine learning
Effective Web data extraction with standard XML technologies

Proceedings of the 10th international conference on World Wide Web
On-Demand Forecasting of Stock Prices Using a Real-Time Predictor

IEEE Transactions on Knowledge and Data Engineering
A Grey-Based Nearest Neighbor Approach for Missing Attribute Value Prediction

Applied Intelligence
Using Grey Relational Analysis to Predict Software Effort with Small Data Sets

METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
"Missing Is Useful': Missing Values in Cost-Sensitive Decision Trees

IEEE Transactions on Knowledge and Data Engineering
The problem of disguised missing data

ACM SIGKDD Explorations Newsletter
Semi-parametric optimization for missing data imputation

Applied Intelligence
EACImpute: An Evolutionary Algorithm for Clustering-Based Imputation

ISDA '09 Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications
Missing Value Estimation for Mixed-Attribute Data Sets

IEEE Transactions on Knowledge and Data Engineering
Decision tree classifiers sensitive to heterogeneous costs

Journal of Systems and Software
Noisy data elimination using mutual k-nearest neighbor for classification mining

Journal of Systems and Software
Target tracking using a hierarchical grey-fuzzy motion decision-making method

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Nearest neighbor pattern classification

IEEE Transactions on Information Theory
The gray prediction search algorithm for block motion estimation

IEEE Transactions on Circuits and Systems for Video Technology

Quality of information-based source assessment and selection

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing kNN imputation methods for dealing with missing data are designed according to Minkowski distance or its variants, and have been shown to be generally efficient for numerical variables (features, or attributes). To deal with heterogeneous (i.e., mixed-attributes) data, we propose a novel kNN (k nearest neighbor) imputation method to iteratively imputing missing data, named GkNN (gray kNN) imputation. GkNN selects k nearest neighbors for each missing datum via calculating the gray distance between the missing datum and all the training data rather than traditional distance metric methods, such as Euclidean distance. Such a distance metric can deal with both numerical and categorical attributes. For achieving the better effectiveness, GkNN regards all the imputed instances (i.e., the missing data been imputed) as observed data, which with complete instances (instances without missing values) together to iteratively impute other missing data. We experimentally evaluate the proposed approach, and demonstrate that the gray distance is much better than the Minkowski distance at both capturing the proximity relationship (or nearness) of two instances and dealing with mixed attributes. Moreover, experimental results also show that the GkNN algorithm is much more efficient than existent kNN imputation methods.