Shell-neighbor method and its application in missing data imputation

Authors:
Shichao Zhang
Affiliations:
Department of Computer Science, Zhejiang Normal University, Jinhua, China and State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Venue:
Applied Intelligence
Year:
2011

Citing 11
Cited 12

Statistical analysis with missing data

Statistical analysis with missing data
C4.5: programs for machine learning

C4.5: programs for machine learning
Missing values and learning of fuzzy rules

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Guest Editors' Introduction: Information Enhancement for Data Mining

IEEE Intelligent Systems
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
"Missing Is Useful': Missing Values in Cost-Sensitive Decision Trees

IEEE Transactions on Knowledge and Data Engineering
Semi-parametric optimization for missing data imputation

Applied Intelligence
GBKII: an imputation method for missing values

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Missing value imputation based on data clustering

Transactions on computational science I
A Novel Framework for Imputation of Missing Values in Databases

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Cost-sensitive classification with respect to waiting cost

Knowledge-Based Systems
Noisy data elimination using mutual k-nearest neighbor for classification mining

Journal of Systems and Software
Simultaneous optimization of artificial neural networks for financial forecasting

Applied Intelligence
Data stream classification with artificial endocrine system

Applied Intelligence
Information enhancement for data mining

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
WebPut: efficient web-based data imputation

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Estimating Semi-Parametric Missing Values with Iterative Imputation

International Journal of Data Warehousing and Mining
Combining kNN Imputation and Bootstrap Calibrated: Empirical Likelihood for Incomplete Data Analysis

International Journal of Data Warehousing and Mining
Imputation for categorical attributes with probabilistic reasoning

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering

Applied Intelligence
Quality of information-based source assessment and selection

Neurocomputing
Clustering with Missing Values

Fundamenta Informaticae

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data preparation is an important step in mining incomplete data. To deal with this problem, this paper introduces a new imputation approach called SN (Shell Neighbors) imputation, or simply SNI. The SNI fills in an incomplete instance (with missing values) in a given dataset by only using its left and right nearest neighbors with respect to each factor (attribute), referred them to Shell Neighbors. The left and right nearest neighbors are selected from a set of nearest neighbors of the incomplete instance. The size of the sets of the nearest neighbors is determined with the cross-validation method. And then the SNI is generalized to deal with missing data in datasets with mixed attributes, for example, continuous and categorical attributes. Some experiments are conducted for evaluating the proposed approach, and demonstrate that the generalized SNI method outperforms the kNN imputation method at imputation accuracy and classification accuracy.