Are loss functions all the same?

Authors:
Lorenzo Rosasco;Ernesto De Vito;Andrea Caponnetto;Michele Piana;Alessandro Verri
Affiliations:
INFM-DISI, Università di Genova, 16146 Genova, Italy;Dipartimento di Matematica, Università di Modena, 41100 Modena, Italy, and INFN, Sezione di Genova, 16146 Genova, Italy;DISI, Università di Genova, 16146 Genova, Italy;INFM-DIMA, Università di Genova, 16146 Genova, Italy;INFM-DISI, Università di Genova, 16146 Genova, Italy
Venue:
Neural Computation
Year:
2004

Citing 6
Cited 7

Regularization theory and neural networks architectures

Neural Computation
The nature of statistical learning theory

The nature of statistical learning theory
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Statistical Properties and Adaptive Tuning of Support Vector Machines

Machine Learning
The covering number in learning theory

Journal of Complexity
A note on different covering numbers in learning theory

Journal of Complexity

SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming

Neural Computation
Risk-sensitive loss functions for sparse multi-category classification problems

Information Sciences: an International Journal
The consistency analysis of coefficient regularized classification with convex loss

WSEAS Transactions on Mathematics
Bound the learning rates with generalized gradients

WSEAS Transactions on Signal Processing
Runtime guarantees for regression problems

Proceedings of the 4th conference on Innovations in Theoretical Computer Science
CloudSVM: training an SVM classifier in cloud computing systems

ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
Generalized relational topic models with data augmentation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this letter, we investigate the impact of choosing different loss functions from the viewpoint of statistical learning theory. We introduce a convexity assumption, which is met by all loss functions commonly used in the literature, and study how the bound on the estimation error changes with the loss. We also derive a general result on the minimizer of the expected risk for a convex loss function in the case of classification. The main outcome of our analysis is that for classification, the hinge loss appears to be the loss of choice. Other things being equal, the hinge loss leads to a convergence rate practically indistinguishable from the logistic loss rate and much better than the square loss rate. Furthermore, if the hypothesis space is sufficiently rich, the bounds obtained for the hinge loss are not loosened by the thresholding stage.