Communications of the ACM
Equivalence of models for polynomial learnability
Information and Computation
Data filtering and distribution modeling algorithms for machine learning
Data filtering and distribution modeling algorithms for machine learning
Toward Efficient Agnostic Learning
Machine Learning - Special issue on computational learning theory, COLT'92
A decision-theoretic generalization of on-line learning and an application to boosting
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Hardness Results for Neural Network Approximation Problems
EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Theoretical Views of Boosting and Applications
ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory
On the Difficulty of Approximately Maximizing Agreements
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
The design and analysis of efficient learning algorithms
The design and analysis of efficient learning algorithms
IEEE Transactions on Information Theory
Optimally-Smooth Adaptive Boosting and Application to Agnostic Learning
ALT '02 Proceedings of the 13th International Conference on Algorithmic Learning Theory
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Hi-index | 0.00 |
We extend the boosting paradigm to the realistic setting of agnostic learning, that is, to a setting where the training sample is generated by an arbitrary (unknown) probability distribution over examples and labels. We define a β-weak agnostic learner with respect to a hypothesis class F as follows: given a distribution P it outputs some hypothesis h ∈ F whose error is at most erP (F) + β, where erP (F) is the minimal error of an hypothesis from F under the distribution P (note that for some distributions the bound may exceed a half). We show a boosting algorithm that using the weak agnostic learner computes a hypothesis whose error is at most max{c1(β)er(F)c2(β), Ɛ}, in time polynomial in 1/Ɛ. While this generalization guarantee is significantly weaker than the one resulting from the known PAC boosting algorithms, one should note that the assumption required for β-weak agnostic learner is much weaker. In fact, an important virtue of the notion of weak agnostic learning is that in many cases such learning is achieved by efficient algorithms.