The Strength of Weak Learnability
Machine Learning
Toward efficient agnostic learning
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Rademacher and gaussian complexities: risk bounds and structural results
The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Agnostically Learning Halfspaces
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Hardness of Learning Halfspaces with Noise
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Cryptographic Hardness for Learning Intersections of Halfspaces
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Hi-index | 0.00 |
Some of the most successful machine learning algorithms, such as Support Vector Machines, are based on learning linear and kernel predictors with respect to a convex loss function, such as the hinge loss. For classification purposes, a more natural loss function is the 0-1 loss. However, using it leads to a non-convex problem for which there is no known efficient algorithm. In this paper, we describe and analyze a new algorithm for learning linear or kernel predictors with respect to the 0-1 loss function. The algorithm is parameterized by L, which quantifies the effective width around the decision boundary in which the predictor may be uncertain. We show that without any distributional assumptions, and for any fixed L, the algorithm runs in polynomial time, and learns a classifier which is worse than the optimal such classifier by at most ε. We also prove a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn such classifiers in time polynomial in L.