K nearest neighbours with mutual information for simultaneous classification and missing data imputation

Authors:
Pedro J. García-Laencina;José-Luis Sancho-Gómez;Aníbal R. Figueiras-Vidal;Michel Verleysen
Affiliations:
Department of Information and Communications Technologies, Universidad Politécnica de Cartagena, Plaza del Hospital 1, 30202 Cartagena, Murcia, Spain;Department of Information and Communications Technologies, Universidad Politécnica de Cartagena, Plaza del Hospital 1, 30202 Cartagena, Murcia, Spain;Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Avda. de la Universidad 30, 28911 Leganés, Madrid, Spain;Université catholique de Louvain, Machine Learning Group, DICE. 3 place du Levant, B-1348 Louvain-la-Neuve, Belgium
Venue:
Neurocomputing
Year:
2009

Citing 21
Cited 11

Statistical analysis with missing data

Statistical analysis with missing data
Instance-Based Learning Algorithms

Machine Learning
Elements of information theory

Elements of information theory
C4.5: programs for machine learning

C4.5: programs for machine learning
A Review and Empirical Evaluation of Feature Weighting Methods for aClass of Lazy Learning Algorithms

Artificial Intelligence Review - Special issue on lazy learning
Lazy learning

Lazy learning
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Machine Learning

Machine Learning
Input Feature Selection by Mutual Information Based on Parzen Window

IEEE Transactions on Pattern Analysis and Machine Intelligence
Problems with Mining Medical Data

COMPSAC '00 24th International Computer Software and Applications Conference
Generalized relevance learning vector quantization

Neural Networks - New developments in self-organizing maps
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Imputation of Missing Values in DNA Microarray Gene Expression Data

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Missing value estimation for DNA microarray gene expression data: local least squares imputation

Bioinformatics
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data

Bioinformatics
Comparison of relevance learning vector quantization with other metric adaptive classification methods

Neural Networks
Missing data imputation in breast cancer prognosis

BioMed'06 Proceedings of the 24th IASTED international conference on Biomedical engineering
Resampling methods for parameter-free and robust feature selection with mutual information

Neurocomputing
Nearest neighbor pattern classification

IEEE Transactions on Information Theory
Estimation by the nearest neighbor rule

IEEE Transactions on Information Theory

Predicting incomplete gene microarray data with the use of supervised learning algorithms

Pattern Recognition Letters
Diagnose the mild cognitive impairment by constructing Bayesian network with missing data

Expert Systems with Applications: An International Journal
Missing data imputation using statistical and machine learning methods in a real breast cancer problem

Artificial Intelligence in Medicine
Edited AdaBoost by weighted kNN

Neurocomputing
Investigating a novel GA-based feature selection method using improved KNN classifiers

International Journal of Information and Communication Technology
A novel two-level nearest neighbor classification algorithm using an adaptive distance metric

Knowledge-Based Systems
WIMP: Web server tool for missing data imputation

Computer Methods and Programs in Biomedicine
Classifying patterns with missing values using Multi-Task Learning perceptrons

Expert Systems with Applications: An International Journal
Locally linear reconstruction based missing value imputation for supervised learning

Neurocomputing
Mixture of Gaussians for distance estimation with missing data

Neurocomputing
Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering

Applied Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

Missing data is a common drawback in many real-life pattern classification scenarios. One of the most popular solutions is missing data imputation by the K nearest neighbours (KNN) algorithm. In this article, we propose a novel KNN imputation procedure using a feature-weighted distance metric based on mutual information (MI). This method provides a missing data estimation aimed at solving the classification task, i.e., it provides an imputed dataset which is directed toward improving the classification performance. The MI-based distance metric is also used to implement an effective KNN classifier. Experimental results on both artificial and real classification datasets are provided to illustrate the efficiency and the robustness of the proposed algorithm.