The Strength of Weak Learnability
Machine Learning
Toward Efficient Agnostic Learning
Machine Learning - Special issue on computational learning theory, COLT'92
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Rademacher and gaussian complexities: risk bounds and structural results
The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Agnostically Learning Halfspaces
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Hardness of Learning Halfspaces with Noise
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Cryptographic Hardness for Learning Intersections of Halfspaces
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
New Results for Learning Noisy Parities and Halfspaces
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
SVM optimization: inverse dependence on training set size
Proceedings of the 25th international conference on Machine learning
Alternative measures of computational complexity with applications to agnostic learning
TAMC'06 Proceedings of the Third international conference on Theory and Applications of Models of Computation
Hi-index | 0.00 |
We describe and analyze a new algorithm for agnostically learning kernel-based halfspaces with respect to the 0-1 loss function. Unlike most of the previous formulations, which rely on surrogate convex loss functions (e.g., hinge-loss in support vector machines (SVMs) and log-loss in logistic regression), we provide finite time/sample guarantees with respect to the more natural 0-1 loss function. The proposed algorithm can learn kernel-based halfspaces in worst-case time poly$(\exp(L\log(L/\epsilon)))$, for any distribution, where $L$ is a Lipschitz constant (which can be thought of as the reciprocal of the margin), and the learned classifier is worse than the optimal halfspace by at most $\epsilon$. We also prove a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn kernel-based halfspaces in time polynomial in $L$.