The nature of statistical learning theory
The nature of statistical learning theory
Machine Learning
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Linear Dimensionality Reduction via a Heteroscedastic Extension of LDA: The Chernoff Criterion
IEEE Transactions on Pattern Analysis and Machine Intelligence
Pruning Training Sets for Learning of Object Categories
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Trading convexity for scalability
ICML '06 Proceedings of the 23rd international conference on Machine learning
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition
Using correntropy as a cost function in linear adaptive filters
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Information theoretic learning with adaptive kernels
Signal Processing
Correntropy: Properties and Applications in Non-Gaussian Signal Processing
IEEE Transactions on Signal Processing
Generalized correlation function: definition, properties, and application to blind equalization
IEEE Transactions on Signal Processing - Part I
A Pitch Detector Based on a Generalized Correlation Function
IEEE Transactions on Audio, Speech, and Language Processing
Heteroscedastic linear feature extraction based on sufficiency conditions
Pattern Recognition
Hi-index | 0.01 |
This paper presents a new loss function for neural network classification, inspired by the recently proposed similarity measure called Correntropy. We show that this function essentially behaves like the conventional square loss for samples that are well within the decision boundary and have small errors, and L"0 or counting norm for samples that are outliers or are difficult to classify. Depending on the value of the kernel size parameter, the proposed loss function moves smoothly from convex to non-convex and becomes a close approximation to the misclassification loss (ideal 0-1 loss). We show that the discriminant function obtained by optimizing the proposed loss function in the neighborhood of the ideal 0-1 loss function to train a neural network is immune to overfitting, more robust to outliers, and has consistent and better generalization performance as compared to other commonly used loss functions, even after prolonged training. The results also show that it is a close competitor to the SVM. Since the proposed method is compatible with simple gradient based online learning, it is a practical way of improving the performance of neural network classifiers.