Nearest neighbours in least-squares data imputation algorithms with different missing patterns

Authors:
Ito Wasito;Boris Mirkin
Affiliations:
Department of Electrical and Computer Engineering, Faculty of Engineering, IIUM, Jl. Gombak, 53100 Kuala-Lumpur, Malaysia;School of Computer Science and Information Systems, Birkbeck College, University of London, Malet Street, London, WC1E 7HX, UK
Venue:
Computational Statistics & Data Analysis
Year:
2006

Citing 10
Cited 3

Statistical analysis with missing data

Statistical analysis with missing data
Unknown attribute values in induction

Proceedings of the sixth international workshop on Machine learning
Using linear algebra for intelligent information retrieval

SIAM Review
Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
EM algorithms for PCA and SPCA

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods

IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Editorial

Artificial Intelligence Review - Special issue on lazy learning
Principal Component Analysis with Missing Data and Its Application to Polyhedral Object Modeling

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mixture model clustering for mixed data with missing information

Computational Statistics & Data Analysis
Nearest neighbour approach in the least-squares data imputation algorithms

Information Sciences: an International Journal

The fuzzy approach to statistical analysis

Computational Statistics & Data Analysis
Experiments for the number of clusters in K-means

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Partial imputation of unseen records to improve classification using a hybrid multi-layered artificial immune system and genetic algorithm

Applied Soft Computing

Quantified Score

Hi-index	0.03

Visualization

Abstract

Methods for imputation of missing data in the so-called least-squares approximation approach, a non-parametric computationally efficient multidimensional technique, are experimentally compared. Contributions are made to each of the three components of the experiment setting: (a) algorithms to be compared, (b) data generation, and (c) patterns of missing data. Specifically, ''global'' methods for least-squares data imputation are reviewed and extensions to them are proposed based on the nearest neighbours (NN) approach. A conventional generator of mixtures of Gaussian distributions is theoretically analysed and, then, modified to scale clusters differently. Patterns of missing data are defined in terms of rows and columns according to three different mechanisms that are referred to as Random missings, Restricted random missings, and Merged database. It appears that NN-based versions almost always outperform their global counterparts. With the Random missings pattern, the winner is always the authors' two-stage method INI, which combines global and local imputation algorithms.