Regularization theory and neural networks architectures
Neural Computation
The nature of statistical learning theory
The nature of statistical learning theory
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
The covering number in learning theory
Journal of Complexity
A note on different covering numbers in learning theory
Journal of Complexity
SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming
Neural Computation
Risk-sensitive loss functions for sparse multi-category classification problems
Information Sciences: an International Journal
The consistency analysis of coefficient regularized classification with convex loss
WSEAS Transactions on Mathematics
Bound the learning rates with generalized gradients
WSEAS Transactions on Signal Processing
Runtime guarantees for regression problems
Proceedings of the 4th conference on Innovations in Theoretical Computer Science
CloudSVM: training an SVM classifier in cloud computing systems
ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
Generalized relational topic models with data augmentation
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
In this letter, we investigate the impact of choosing different loss functions from the viewpoint of statistical learning theory. We introduce a convexity assumption, which is met by all loss functions commonly used in the literature, and study how the bound on the estimation error changes with the loss. We also derive a general result on the minimizer of the expected risk for a convex loss function in the case of classification. The main outcome of our analysis is that for classification, the hinge loss appears to be the loss of choice. Other things being equal, the hinge loss leads to a convergence rate practically indistinguishable from the logistic loss rate and much better than the square loss rate. Furthermore, if the hypothesis space is sufficiently rich, the bounds obtained for the hinge loss are not loosened by the thresholding stage.