Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Machine Learning
Semi-Supervised Learning on Riemannian Manifolds
Machine Learning
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Principled Hybrids of Generative and Discriminative Models
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
The asymptotics of semi-supervised learning in discriminative probabilistic models
Proceedings of the 25th international conference on Machine learning
Covariate Shift Adaptation by Importance Weighted Cross Validation
The Journal of Machine Learning Research
Semi-Supervised Learning
Density Ratio Estimation in Machine Learning
Density Ratio Estimation in Machine Learning
Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation
Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation
IEEE Transactions on Information Theory - Part 2
Hi-index | 0.00 |
In this paper we study statistical properties of semi-supervised learning, which is considered to be an important problem in the field of machine learning. In standard supervised learning only labeled data is observed, and classification and regression problems are formalized as supervised learning. On the other hand, in semi-supervised learning, unlabeled data is also obtained in addition to labeled data. Hence, the ability to exploit unlabeled data is important to improve prediction accuracy in semi-supervised learning. This problem is regarded as a semiparametric estimation problem with missing data. Under discriminative probabilistic models, it was considered that unlabeled data is useless to improve the estimation accuracy. Recently, the weighted estimator using unlabeled data achieves a better prediction accuracy compared to the learning method using only labeled data, especially when the discriminative probabilistic model is misspecified. That is, improvement under the semiparametric model with missing data is possible when the semiparametric model is misspecified. In this paper, we apply the density-ratio estimator to obtain the weight function in semi-supervised learning. Our approach is advantageous because the proposed estimator does not require well-specified probabilistic models for the probability of the unlabeled data. Based on statistical asymptotic theory, we prove that the estimation accuracy of our method outperforms supervised learning using only labeled data. Some numerical experiments present the usefulness of our methods.