Using classifier-based nominal imputation to improve machine learning

Authors:
Xiaoyuan Su;Russell Greiner;Taghi M. Khoshgoftaar;Amri Napolitano
Affiliations:
Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL;Department of Computing Science, University of Alberta, Edmonton, AB, Canada;Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL;Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL
Venue:
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Year:
2011

Citing 7
Cited 1

Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Using Imputation Techniques to Help Learn Accurate Classifiers

ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 01
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Locally linear reconstruction based missing value imputation for supervised learning

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many learning algorithms perform poorly when the training data are incomplete. One standard approach involves first imputing the missing values, then giving the completed data to the learning algorithm. However, this is especially problematic when the features are nominal. This work presents "classifier-based nominal imputation" (CNI), an easy-to-implement and effective nominal imputation technique that views nominal imputation as classification: it learns a classifier for each feature (that maps the other features of an instance to the predicted value of that feature), then uses that classifier to predict themissing values of that feature. Our empirical results show that learners that preprocess their incomplete training data using CNI using support vector machine or decision tree learners have significantly higher predictive accuracy than learners that (1) do not use preprocessing, (2) use baseline imputation techniques, or (3) use this CNI preprocessor with other classification algorithms. This improvement is especially apparent when the base learner is instance-based. CNI is also found helpful for other base learners, such as naïve Bayes and decision tree, on incomplete nominal data.