Learning Kernel-Based Halfspaces with the 0-1 Loss

Authors:
Shai Shalev-Shwartz;Ohad Shamir;Karthik Sridharan
Affiliations:
shais@cs.huji.ac.il;ohadsh@microsoft.com;karthik@ttic.edu
Venue:
SIAM Journal on Computing
Year:
2011

Citing 11
Cited 0

The Strength of Weak Learnability

Machine Learning
Toward Efficient Agnostic Learning

Machine Learning - Special issue on computational learning theory, COLT'92
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Rademacher and gaussian complexities: risk bounds and structural results

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Agnostically Learning Halfspaces

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Hardness of Learning Halfspaces with Noise

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Cryptographic Hardness for Learning Intersections of Halfspaces

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
New Results for Learning Noisy Parities and Halfspaces

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
SVM optimization: inverse dependence on training set size

Proceedings of the 25th international conference on Machine learning
Alternative measures of computational complexity with applications to agnostic learning

TAMC'06 Proceedings of the Third international conference on Theory and Applications of Models of Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe and analyze a new algorithm for agnostically learning kernel-based halfspaces with respect to the 0-1 loss function. Unlike most of the previous formulations, which rely on surrogate convex loss functions (e.g., hinge-loss in support vector machines (SVMs) and log-loss in logistic regression), we provide finite time/sample guarantees with respect to the more natural 0-1 loss function. The proposed algorithm can learn kernel-based halfspaces in worst-case time poly$(\exp(L\log(L/\epsilon)))$, for any distribution, where $L$ is a Lipschitz constant (which can be thought of as the reciprocal of the margin), and the learned classifier is worse than the optimal halfspace by at most $\epsilon$. We also prove a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn kernel-based halfspaces in time polynomial in $L$.