A discriminative model for semi-supervised learning

Authors:
Maria-Florina Balcan;Avrim Blum
Affiliations:
Georgia Institute of Technology, Atlanta, Georgia;Carnegie Mellon University, Pittsburgh, Pennsylvania
Venue:
Journal of the ACM (JACM)
Year:
2010

Citing 44
Cited 9

A theory of the learnable

Communications of the ACM
A general lower bound on the number of examples needed for learning

Information and Computation
Learnability and the Vapnik-Chervonenkis dimension

Journal of the ACM (JACM)
Polynomial time algorithms for learning neural nets

COLT '90 Proceedings of the third annual workshop on Computational learning theory
Learning DNF under the uniform distribution in quasi-polynomial time

COLT '90 Proceedings of the third annual workshop on Computational learning theory
Learnability with respect to fixed distributions

Theoretical Computer Science
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
An introduction to computational learning theory

An introduction to computational learning theory
On the exponential value of labeled samples

Pattern Recognition Letters
Learning from a mixture of labeled and unlabeled examples with parametric side information

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Learning an intersection of a constant number of halfspaces over a uniform distribution

Journal of Computer and System Sciences - Special issue: papers from the 32nd and 34th annual symposia on foundations of computer science, Oct. 2–4, 1991 and Nov. 3–5, 1993
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Efficient noise-tolerant learning from statistical queries

Journal of the ACM (JACM)
A sharp concentration inequality with application

Random Structures & Algorithms
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Optimal outlier removal in high-dimensional

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Metric-Based Methods for Adaptive Model Selection and Regularization

Machine Learning
Learning Intersections and Thresholds of Halfspaces

FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Information, Prediction, and Query by Committee

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Model Selection and Error Estimation

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Combining Labeled and Unlabeled Data for Text Classification with a Large Number of Categories

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Learning from Labeled and Unlabeled Data using Graph Mincuts

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Random Sampling based Algorithm for Learning the Intersection of Half-spaces

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Detecting a network failure

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Rademacher and gaussian complexities: risk bounds and structural results

The Journal of Machine Learning Research
Unsupervised Improvement of Visual Detectors using Co-Training

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Semi-supervised learning using randomized mincuts

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
A bound on the label complexity of agnostic active learning

Proceedings of the 24th international conference on Machine learning
The asymptotics of semi-supervised learning in discriminative probabilistic models

Proceedings of the 25th international conference on Machine learning
Constant depth circuits, Fourier transform, and learnability

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Agnostic active learning

Journal of Computer and System Sciences
Feature generation for text categorization using world knowledge

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Large scale unstructured document classification using unlabeled data and syntactic information

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Margin based active learning

COLT'07 Proceedings of the 20th annual conference on Learning theory
Smart PAC-learners

ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Active learning in the non-realizable case

ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
The value of agreement, a new boosting algorithm

COLT'05 Proceedings of the 18th annual conference on Learning Theory
Generalization error bounds using unlabeled data

COLT'05 Proceedings of the 18th annual conference on Learning Theory
The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter

IEEE Transactions on Information Theory - Part 2

Distribution-dependent PAC-bayes priors

ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Smart PAC-learners

Theoretical Computer Science
Mining association language patterns using a distributional semantic model for negative life event classification

Journal of Biomedical Informatics
Supervised learning and co-training

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Temporal logic for process specification and recognition

Intelligent Service Robotics
Tighter PAC-Bayes bounds through distribution-dependent priors

Theoretical Computer Science
On the complexity of trial and error

Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Probability-one homotopy maps for tracking constrained clustering solutions

Proceedings of the High Performance Computing Symposium
Supervised learning and Co-training

Theoretical Computer Science

Quantified Score

Hi-index	0.01

Visualization

Abstract

Supervised learning—that is, learning from labeled examples—is an area of Machine Learning that has reached substantial maturity. It has generated general-purpose and practically successful algorithms and the foundations are quite well understood and captured by theoretical frameworks such as the PAC-learning model and the Statistical Learning theory framework. However, for many contemporary practical problems such as classifying web pages or detecting spam, there is often additional information available in the form of unlabeled data, which is often much cheaper and more plentiful than labeled data. As a consequence, there has recently been substantial interest in semi-supervised learning—using unlabeled data together with labeled data—since any useful information that reduces the amount of labeled data needed can be a significant benefit. Several techniques have been developed for doing this, along with experimental results on a variety of different learning problems. Unfortunately, the standard learning frameworks for reasoning about supervised learning do not capture the key aspects and the assumptions underlying these semi-supervised learning methods. In this article, we describe an augmented version of the PAC model designed for semi-supervised learning, that can be used to reason about many of the different approaches taken over the past decade in the Machine Learning community. This model provides a unified framework for analyzing when and why unlabeled data can help, in which one can analyze both sample-complexity and algorithmic issues. The model can be viewed as an extension of the standard PAC model where, in addition to a concept class C, one also proposes a compatibility notion: a type of compatibility that one believes the target concept should have with the underlying distribution of data. Unlabeled data is then potentially helpful in this setting because it allows one to estimate compatibility over the space of hypotheses, and to reduce the size of the search space from the whole set of hypotheses C down to those that, according to one's assumptions, are a-priori reasonable with respect to the distribution. As we show, many of the assumptions underlying existing semi-supervised learning algorithms can be formulated in this framework. After proposing the model, we then analyze sample-complexity issues in this setting: that is, how much of each type of data one should expect to need in order to learn well, and what the key quantities are that these numbers depend on. We also consider the algorithmic question of how to efficiently optimize for natural classes and compatibility notions, and provide several algorithmic results including an improved bound for Co-Training with linear separators when the distribution satisfies independence given the label.