Supervised learning and Co-training

Authors:
Malte Darnstädt;Hans Ulrich Simon;Balázs Szörényi
Affiliations:
Fakultät für Mathematik, Ruhr-Universität Bochum, D-44780 Bochum, Germany;Fakultät für Mathematik, Ruhr-Universität Bochum, D-44780 Bochum, Germany;Hungarian Academy of Sciences and University of Szeged, Research Group on Artificial Intelligence, H-6720 Szeged, Hungary and INRIA Lille, SequeL project, F-59650 Villeneuve dAscq, France
Venue:
Theoretical Computer Science
Year:
2014

Citing 10
Cited 0

A theory of the learnable

Communications of the ACM
A general lower bound on the number of examples needed for learning

Information and Computation
Learnability and the Vapnik-Chervonenkis dimension

Journal of the ACM (JACM)
Learnability with respect to fixed distributions

Theoretical Computer Science
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A bound on the label complexity of agnostic active learning

Proceedings of the 24th international conference on Machine learning
A discriminative model for semi-supervised learning

Journal of the ACM (JACM)
Semi-Supervised Learning

Semi-Supervised Learning
Smart PAC-learners

Theoretical Computer Science
Generalization error bounds using unlabeled data

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Quantified Score

Hi-index	5.23

Visualization

Abstract

Co-training under the Conditional Independence Assumption is among the models which demonstrate how radically the need for labeled data can be reduced if a huge amount of unlabeled data is available. In this paper, we explore how much credit for this saving must be assigned solely to the extra assumptions underlying the Co-training Model. To this end, we compute general (almost tight) upper and lower bounds on the sample size needed to achieve the success criterion of PAC-learning in the realizable case within the model of Co-training under the Conditional Independence Assumption in a purely supervised setting. The upper bounds lie significantly below the lower bounds for PAC-learning without Co-training. Thus, Co-training saves labeled data even when not combined with unlabeled data. On the other hand, the saving is much less radical than the known savings in the semi-supervised setting.