Agnostic Boosting

Authors:
Shai Ben-David;Philip M. Long;Yishay Mansour
Affiliations:
-;-;-
Venue:
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Year:
2001

Citing 10
Cited 2

A theory of the learnable

Communications of the ACM
Equivalence of models for polynomial learnability

Information and Computation
Data filtering and distribution modeling algorithms for machine learning

Data filtering and distribution modeling algorithms for machine learning
Toward Efficient Agnostic Learning

Machine Learning - Special issue on computational learning theory, COLT'92
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Hardness Results for Neural Network Approximation Problems

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Theoretical Views of Boosting and Applications

ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory
On the Difficulty of Approximately Maximizing Agreements

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
The design and analysis of efficient learning algorithms

The design and analysis of efficient learning algorithms
The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network

IEEE Transactions on Information Theory

Optimally-Smooth Adaptive Boosting and Application to Agnostic Learning

ALT '02 Proceedings of the 13th International Conference on Algorithmic Learning Theory
Martingale boosting

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We extend the boosting paradigm to the realistic setting of agnostic learning, that is, to a setting where the training sample is generated by an arbitrary (unknown) probability distribution over examples and labels. We define a β-weak agnostic learner with respect to a hypothesis class F as follows: given a distribution P it outputs some hypothesis h ∈ F whose error is at most erP (F) + β, where erP (F) is the minimal error of an hypothesis from F under the distribution P (note that for some distributions the bound may exceed a half). We show a boosting algorithm that using the weak agnostic learner computes a hypothesis whose error is at most max{c1(β)er(F)c2(β), Ɛ}, in time polynomial in 1/Ɛ. While this generalization guarantee is significantly weaker than the one resulting from the known PAC boosting algorithms, one should note that the assumption required for β-weak agnostic learner is much weaker. In fact, an important virtue of the notion of weak agnostic learning is that in many cases such learning is achieved by efficient algorithms.