Semi-supervised learning with density-ratio estimation

Authors:
Masanori Kawakita;Takafumi Kanamori
Affiliations:
Department of Informatics, Kyushu University, Nishi-ku, Fukuoka, Japan 819-0395;Department of Computer Science and Mathematical Informatics, Nagoya University, Chikusaku, Nagoya, Japan 464-8603
Venue:
Machine Learning
Year:
2013

Citing 12
Cited 0

Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Soft Margins for AdaBoost

Machine Learning
Semi-Supervised Learning on Riemannian Manifolds

Machine Learning
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Principled Hybrids of Generative and Discriminative Models

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
The asymptotics of semi-supervised learning in discriminative probabilistic models

Proceedings of the 25th international conference on Machine learning
Covariate Shift Adaptation by Importance Weighted Cross Validation

The Journal of Machine Learning Research
Semi-Supervised Learning

Semi-Supervised Learning
Statistical analysis of kernel-based least-squares density-ratio estimation

Machine Learning
Density Ratio Estimation in Machine Learning

Density Ratio Estimation in Machine Learning
Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation

Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation
The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter

IEEE Transactions on Information Theory - Part 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we study statistical properties of semi-supervised learning, which is considered to be an important problem in the field of machine learning. In standard supervised learning only labeled data is observed, and classification and regression problems are formalized as supervised learning. On the other hand, in semi-supervised learning, unlabeled data is also obtained in addition to labeled data. Hence, the ability to exploit unlabeled data is important to improve prediction accuracy in semi-supervised learning. This problem is regarded as a semiparametric estimation problem with missing data. Under discriminative probabilistic models, it was considered that unlabeled data is useless to improve the estimation accuracy. Recently, the weighted estimator using unlabeled data achieves a better prediction accuracy compared to the learning method using only labeled data, especially when the discriminative probabilistic model is misspecified. That is, improvement under the semiparametric model with missing data is possible when the semiparametric model is misspecified. In this paper, we apply the density-ratio estimator to obtain the weight function in semi-supervised learning. Our approach is advantageous because the proposed estimator does not require well-specified probabilistic models for the probability of the unlabeled data. Based on statistical asymptotic theory, we prove that the estimation accuracy of our method outperforms supervised learning using only labeled data. Some numerical experiments present the usefulness of our methods.