Mixture of Gaussians for distance estimation with missing data

Authors:
Emil Eirola;Amaury Lendasse;Vincent Vandewalle;Christophe Biernacki
Affiliations:
-;-;-;-
Venue:
Neurocomputing
Year:
2014

Citing 9
Cited 0

Mixture model clustering for mixed data with missing information

Computational Statistics & Data Analysis
Training and Application of Artificial Neural Networks with Incomplete Data

IEA/AIE '02 Proceedings of the 15th international conference on Industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence
Learning from Incomplete Data

Learning from Incomplete Data
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
On fast supervised learning for normal mixture models with missing information

Pattern Recognition
K nearest neighbours with mutual information for simultaneous classification and missing data imputation

Neurocomputing
A conservative feature subset selection algorithm with missing data

Neurocomputing
Feature selection with missing data using mutual information estimators

Neurocomputing
Incomplete-case nearest neighbor imputation in software measurement data

Information Sciences: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many data sets have missing values in practical application contexts, but the majority of commonly studied machine learning methods cannot be applied directly when there are incomplete samples. However, most such methods only depend on the relative differences between samples instead of their particular values, and thus one useful approach is to directly estimate the pairwise distances between all samples in the data set. This is accomplished by fitting a Gaussian mixture model to the data, and using it to derive estimates for the distances. A variant of the model for high-dimensional data with missing values is also studied. Experimental simulations confirm that the proposed method provides accurate estimates compared to alternative methods for estimating distances. In particular, using the mixture model for estimating distances is on average more accurate than using the same model to impute any missing values and then calculating distances. The experimental evaluation additionally shows that more accurately estimating distances lead to improved prediction performance for classification and regression tasks when used as inputs for a neural network.