Typical Example Selection for Learning Classifiers

Authors:
Jianchao Han;Nick Cercone
Affiliations:
-;-
Venue:
AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Year:
2000

Citing 3
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Machine Learning

Machine Learning
Feature Extraction, Construction and Selection: A Data Mining Perspective

Feature Extraction, Construction and Selection: A Data Mining Perspective

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of learning classifiers with a small labeled example set and a large unlabeled example set. This situation arises in many applications, e.g., identifying medical images, webpages, sensing data, etc. where it is hard and expensive to label the examples while it is much easier to acquire unlabeled examples. We suppose that the training data is distributed in the mixture model with Gaussian components. An approach to selecting typical examples for learning classifiers is proposed, and the typicality measure is defined with respect to the labeled data according to the Mahalanobis squared distance. The algorithm for selecting typical examples is described. The basic idea is that a training example is randomly drawn, and its typicality is measured. If the typicality is greater than the threshold, then the training example is sampled. The number of typical examples sampled is limited to memory capacity.