Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Machine Learning
Feature Extraction, Construction and Selection: A Data Mining Perspective
Feature Extraction, Construction and Selection: A Data Mining Perspective
Hi-index | 0.00 |
We consider the problem of learning classifiers with a small labeled example set and a large unlabeled example set. This situation arises in many applications, e.g., identifying medical images, webpages, sensing data, etc. where it is hard and expensive to label the examples while it is much easier to acquire unlabeled examples. We suppose that the training data is distributed in the mixture model with Gaussian components. An approach to selecting typical examples for learning classifiers is proposed, and the typicality measure is defined with respect to the labeled data according to the Mahalanobis squared distance. The algorithm for selecting typical examples is described. The basic idea is that a training example is randomly drawn, and its typicality is measured. If the typicality is greater than the threshold, then the training example is sampled. The number of typical examples sampled is limited to memory capacity.