Imputation of missing values for compositional data using classical and robust methods

Authors:
K. Hron;M. Templ;P. Filzmoser
Affiliations:
Department of Mathematical Analysis and Applications of Mathematics, Palacký University, Faculty of Science, 17. listopadu 12, 771 46 Olomouc, Czech Republic;Department of Statistics and Probability Theory, Vienna University of Technology, Wiedner Hauptstraíe 8-10, 1040 Vienna, Austria and Statistics Austria, Guglgasse 13, 1110 Vienna, Austria;Department of Statistics and Probability Theory, Vienna University of Technology, Wiedner Hauptstraíe 8-10, 1040 Vienna, Austria
Venue:
Computational Statistics & Data Analysis
Year:
2010

Citing 7
Cited 7

The statistical analysis of compositional data

The statistical analysis of compositional data
Missing value estimation for DNA microarray gene expression data: local least squares imputation

Bioinformatics
Computing LTS Regression for Large Data Sets

Data Mining and Knowledge Discovery
Non-linear PCA: a missing data approach

Bioinformatics
Principal component analysis for data containing outliers and missing elements

Computational Statistics & Data Analysis
A modified EM alr-algorithm for replacing rounded zeros in compositional data sets

Computers & Geosciences
Impact of non-normal random effects on inference by multiple imputation: A simulation assessment

Computational Statistics & Data Analysis

Editorial: Special issue on variable selection and robust procedures

Computational Statistics & Data Analysis
Iterative stepwise regression imputation using standard and robust methods

Computational Statistics & Data Analysis
Interpretation of multivariate outliers for compositional data

Computers & Geosciences
Exploring incomplete data using visualization techniques

Advances in Data Analysis and Classification
Model-based replacement of rounded zeros in compositional data: Classical and robust approaches

Computational Statistics & Data Analysis
Is compositional data analysis a way to see beyond the illusion?

Computers & Geosciences
Locally linear reconstruction based missing value imputation for supervised learning

Neurocomputing

Quantified Score

Hi-index	0.03

Visualization

Abstract

New imputation algorithms for estimating missing values in compositional data are introduced. A first proposal uses the k-nearest neighbor procedure based on the Aitchison distance, a distance measure especially designed for compositional data. It is important to adjust the estimated missing values to the overall size of the compositional parts of the neighbors. As a second proposal an iterative model-based imputation technique is introduced which initially starts from the result of the proposed k-nearest neighbor procedure. The method is based on iterative regressions, thereby accounting for the whole multivariate data information. The regressions have to be performed in a transformed space, and depending on the data quality classical or robust regression techniques can be employed. The proposed methods are tested on a real and on simulated data sets. The results show that the proposed methods outperform standard imputation methods. In the presence of outliers, the model-based method with robust regressions is preferable.