Imputation of missing values for semi-supervised data using the proximity in random forests

Authors:
Tsunenori Ishioka
Affiliations:
The National Center for University Entrance Examinations, Tokyo, Japan
Venue:
Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Year:
2012

Citing 1
Cited 0

Random Forests

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a procedure that imputes missing values by using random forests on semi-supervised data. We found that the rate of correct classification of our method is higher than that of other methods: a simple expansion of Liaw's "rfImpute" for (un)supervised data and the k-nearest neighbor method (kNN). Our method can handle missing predictor variables as well as missing response variable. An imputation that uses random forests for semi-supervised cases in the training data set has never been implemented until now.