Selection-fusion approach for classification of datasets with missing values

Authors:
Mostafa Ghannad-Rezaie;Hamid Soltanian-Zadeh;Hao Ying;Ming Dong
Affiliations:
Department of Diagnostic Radiology, Henry Ford Hospital, Detroit, MI 48202, USA and Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI 48202, USA and Department ...;Department of Diagnostic Radiology, Henry Ford Hospital, Detroit, MI 48202, USA and Control and Intelligent Processing Center of Excellence, Electrical and Computer Engineering Department, Univers ...;Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI 48202, USA;Department of Computer Science, Wayne State University, Detroit, MI 48202, USA
Venue:
Pattern Recognition
Year:
2010

Citing 11
Cited 4

Imputation of Missing Data in Industrial Databases

Applied Intelligence
A pseudo-nearest-neighbor approach for missing data recovery on Gaussian random data sets

Pattern Recognition Letters
A Comparison of Stacking with Meta Decision Trees to Bagging, Boosting, and Stacking with other Methods

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Combining Pattern Classifiers: Methods and Algorithms

Combining Pattern Classifiers: Methods and Algorithms
Orthogonal Decision Trees

IEEE Transactions on Knowledge and Data Engineering
On fast supervised learning for normal mixture models with missing information

Pattern Recognition
Using diversity of errors for selecting members of a committee classifier

Pattern Recognition
EROS: Ensemble rough subspaces

Pattern Recognition
Content-based image database system for epilepsy

Computer Methods and Programs in Biomedicine
Impact of missing data in evaluating artificial neural networks trained on complete data

Computers in Biology and Medicine
A new maximum margin algorithm for one-class problems and its boosting implementation

Pattern Recognition

A probabilistic model of classifier competence for dynamic ensemble selection

Pattern Recognition
A unifying view on dataset shift in classification

Pattern Recognition
A classifier ensemble approach for the missing feature problem

Artificial Intelligence in Medicine
Dynamic discriminant functions with missing feature values

Pattern Recognition Letters

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper proposes a new approach based on missing value pattern discovery for classifying incomplete data. This approach is particularly designed for classification of datasets with a small number of samples and a high percentage of missing values where available missing value treatment approaches do not usually work well. Based on the pattern of the missing values, the proposed approach finds subsets of samples for which most of the features are available and trains a classifier for each subset. Then, it combines the outputs of the classifiers. Subset selection is translated into a clustering problem, allowing derivation of a mathematical framework for it. A trade off is established between the computational complexity (number of subsets) and the accuracy of the overall classifier. To deal with this trade off, a numerical criterion is proposed for the prediction of the overall performance. The proposed method is applied to seven datasets from the popular University of California, Irvine data mining archive and an epilepsy dataset from Henry Ford Hospital, Detroit, Michigan (total of eight datasets). Experimental results show that classification accuracy of the proposed method is superior to those of the widely used multiple imputations method and four other methods. They also show that the level of superiority depends on the pattern and percentage of missing values.