Imputation of missing values for semi-supervised data using the proximity in random forests

  • Authors:
  • Tsunenori Ishioka

  • Affiliations:
  • The National Center for University Entrance Examinations, Tokyo, Japan

  • Venue:
  • Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a procedure that imputes missing values by using random forests on semi-supervised data. We found that the rate of correct classification of our method is higher than that of other methods: a simple expansion of Liaw's "rfImpute" for (un)supervised data and the k-nearest neighbor method (kNN). Our method can handle missing predictor variables as well as missing response variable. An imputation that uses random forests for semi-supervised cases in the training data set has never been implemented until now.