Machine Learning
Hi-index | 0.00 |
This paper presents a procedure that imputes missing values by using random forests on semi-supervised data. We found that the rate of correct classification of our method is higher than that of other methods: a simple expansion of Liaw's "rfImpute" for (un)supervised data and the k-nearest neighbor method (kNN). Our method can handle missing predictor variables as well as missing response variable. An imputation that uses random forests for semi-supervised cases in the training data set has never been implemented until now.