The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Bayesian networks for imputation in classification problems
Journal of Intelligent Information Systems
Impact of imputation of missing values on classification error for discrete data
Pattern Recognition
Expert Systems with Applications: An International Journal
AN EMPIRICAL COMPARISON OF TECHNIQUES FOR HANDLING INCOMPLETE DATA USING DECISION TREES
Applied Artificial Intelligence
Robust smoothing of gridded data in one and higher dimensions with missing values
Computational Statistics & Data Analysis
Pattern classification with missing data: a review
Neural Computing and Applications - Special Issue - KES2008
Selection-fusion approach for classification of datasets with missing values
Pattern Recognition
Learn++.MF: A random subspace approach for the missing feature problem
Pattern Recognition
Artificial Intelligence in Medicine
A Novel Framework for Imputation of Missing Values in Databases
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Ensemble-based regression analysis of multimodal medical data for osteopenia diagnosis
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Objectives: Many classification problems must deal with data that contains missing values. In such cases data imputation is critical. This paper evaluates the performance of several statistical and machine learning imputation methods, including our novel multiple imputation ensemble approach, using different datasets. Materials and methods: Several state-of-the-art approaches are compared using different datasets. Some state-of-the-art classifiers (including support vector machines and input decimated ensembles) are tested with several imputation methods. The novel approach proposed in this work is a multiple imputation method based on random subspace, where each missing value is calculated considering a different cluster of the data. We have used a fuzzy clustering approach for the clustering algorithm. Results: Our experiments have shown that the proposed multiple imputation approach based on clustering and a random subspace classifier outperforms several other state-of-the-art approaches. Using the Wilcoxon signed-rank test (reject the null hypothesis, level of significance 0.05) we have shown that the proposed best approach is outperformed by the classifier trained using the original data (i.e., without missing values) only when 20% of the data are missed. Moreover, we have shown that coupling an imputation method with our cluster based imputation we outperform the base method (level of significance ~0.05). Conclusion: Starting from the assumptions that the feature set must be partially redundant and that the redundancy is distributed randomly over the feature set, we have proposed a method that works quite well even when a large percentage of the features is missing (=30%). Our best approach is available (MATLAB code) at bias.csr.unibo.it/nanni/MI.rar.