Communications of the ACM
A general lower bound on the number of examples needed for learning
Information and Computation
Learnability and the Vapnik-Chervonenkis dimension
Journal of the ACM (JACM)
Polynomial time algorithms for learning neural nets
COLT '90 Proceedings of the third annual workshop on Computational learning theory
Learning DNF under the uniform distribution in quasi-polynomial time
COLT '90 Proceedings of the third annual workshop on Computational learning theory
Learnability with respect to fixed distributions
Theoretical Computer Science
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
An introduction to computational learning theory
An introduction to computational learning theory
On the exponential value of labeled samples
Pattern Recognition Letters
Learning from a mixture of labeled and unlabeled examples with parametric side information
COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Learning an intersection of a constant number of halfspaces over a uniform distribution
Journal of Computer and System Sciences - Special issue: papers from the 32nd and 34th annual symposia on foundations of computer science, Oct. 2–4, 1991 and Nov. 3–5, 1993
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Efficient noise-tolerant learning from statistical queries
Journal of the ACM (JACM)
A sharp concentration inequality with application
Random Structures & Algorithms
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
Optimal outlier removal in high-dimensional
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Metric-Based Methods for Adaptive Model Selection and Regularization
Machine Learning
Learning Intersections and Thresholds of Halfspaces
FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Information, Prediction, and Query by Committee
Advances in Neural Information Processing Systems 5, [NIPS Conference]
Model Selection and Error Estimation
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Combining Labeled and Unlabeled Data for Text Classification with a Large Number of Categories
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Learning from Labeled and Unlabeled Data using Graph Mincuts
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Random Sampling based Algorithm for Learning the Intersection of Half-spaces
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Support vector machine active learning with applications to text classification
The Journal of Machine Learning Research
Rademacher and gaussian complexities: risk bounds and structural results
The Journal of Machine Learning Research
Unsupervised Improvement of Visual Detectors using Co-Training
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Semi-supervised learning using randomized mincuts
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
A bound on the label complexity of agnostic active learning
Proceedings of the 24th international conference on Machine learning
The asymptotics of semi-supervised learning in discriminative probabilistic models
Proceedings of the 25th international conference on Machine learning
Constant depth circuits, Fourier transform, and learnability
SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Journal of Computer and System Sciences
Feature generation for text categorization using world knowledge
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Large scale unstructured document classification using unlabeled data and syntactic information
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
COLT'07 Proceedings of the 20th annual conference on Learning theory
ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Active learning in the non-realizable case
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
The value of agreement, a new boosting algorithm
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Generalization error bounds using unlabeled data
COLT'05 Proceedings of the 18th annual conference on Learning Theory
IEEE Transactions on Information Theory - Part 2
Distribution-dependent PAC-bayes priors
ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Theoretical Computer Science
Journal of Biomedical Informatics
Supervised learning and co-training
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Temporal logic for process specification and recognition
Intelligent Service Robotics
Tighter PAC-Bayes bounds through distribution-dependent priors
Theoretical Computer Science
On the complexity of trial and error
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Probability-one homotopy maps for tracking constrained clustering solutions
Proceedings of the High Performance Computing Symposium
Supervised learning and Co-training
Theoretical Computer Science
Hi-index | 0.01 |
Supervised learning—that is, learning from labeled examples—is an area of Machine Learning that has reached substantial maturity. It has generated general-purpose and practically successful algorithms and the foundations are quite well understood and captured by theoretical frameworks such as the PAC-learning model and the Statistical Learning theory framework. However, for many contemporary practical problems such as classifying web pages or detecting spam, there is often additional information available in the form of unlabeled data, which is often much cheaper and more plentiful than labeled data. As a consequence, there has recently been substantial interest in semi-supervised learning—using unlabeled data together with labeled data—since any useful information that reduces the amount of labeled data needed can be a significant benefit. Several techniques have been developed for doing this, along with experimental results on a variety of different learning problems. Unfortunately, the standard learning frameworks for reasoning about supervised learning do not capture the key aspects and the assumptions underlying these semi-supervised learning methods. In this article, we describe an augmented version of the PAC model designed for semi-supervised learning, that can be used to reason about many of the different approaches taken over the past decade in the Machine Learning community. This model provides a unified framework for analyzing when and why unlabeled data can help, in which one can analyze both sample-complexity and algorithmic issues. The model can be viewed as an extension of the standard PAC model where, in addition to a concept class C, one also proposes a compatibility notion: a type of compatibility that one believes the target concept should have with the underlying distribution of data. Unlabeled data is then potentially helpful in this setting because it allows one to estimate compatibility over the space of hypotheses, and to reduce the size of the search space from the whole set of hypotheses C down to those that, according to one's assumptions, are a-priori reasonable with respect to the distribution. As we show, many of the assumptions underlying existing semi-supervised learning algorithms can be formulated in this framework. After proposing the model, we then analyze sample-complexity issues in this setting: that is, how much of each type of data one should expect to need in order to learn well, and what the key quantities are that these numbers depend on. We also consider the algorithmic question of how to efficiently optimize for natural classes and compatibility notions, and provide several algorithmic results including an improved bound for Co-Training with linear separators when the distribution satisfies independence given the label.