Missing Value Imputation Using a Semi-supervised Rank Aggregation Approach

Authors:
Edson T. Matsubara;Ronaldo C. Prati;Gustavo E. Batista;Maria C. Monard
Affiliations:
Institute of Mathematics and Computer Science at University of São Paulo, São Carlos, Brazil ZIP Code 13560-970;Institute of Mathematics and Computer Science at University of São Paulo, São Carlos, Brazil ZIP Code 13560-970;Institute of Mathematics and Computer Science at University of São Paulo, São Carlos, Brazil ZIP Code 13560-970;Institute of Mathematics and Computer Science at University of São Paulo, São Carlos, Brazil ZIP Code 13560-970
Venue:
SBIA '08 Proceedings of the 19th Brazilian Symposium on Artificial Intelligence: Advances in Artificial Intelligence
Year:
2008

Citing 6
Cited 1

Statistical analysis with missing data

Statistical analysis with missing data
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research

An analysis on the use of pre-processing methods in evolutionary fuzzy systems for subgroup discovery

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

One relevant problem in data quality is the presence of missing data. In cases where missing data are abundant, effective ways to deal with these absences could improve the performance of machine learning algorithms. Missing data can be treated using imputation. Imputation methods replace the missing data by values estimated from the available data. This paper presents Corai, an imputation algorithm which is an adaption of Co-training, a multi-view semi-supervised learning algorithm. The comparison of Coraiwith other imputation methods found in the literature in three data sets from UCI with different levels of missingness inserted into up to three attributes, shows that Coraitends to perform well in data sets at greater percentages of missingness and number of attributes with missing values.