Variance inflation in high dimensional Support Vector Machines

Authors:
Trine Julie Abrahamsen;Lars Kai Hansen
Affiliations:
-;-
Venue:
Pattern Recognition Letters
Year:
2013

Citing 9
Cited 0

Stochastic linear learning: exact test and training error averages

Neural Networks
The nature of statistical learning theory

The nature of statistical learning theory
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
SVMs modeling for highly imbalanced classification

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.10

Visualization

Abstract

Many important machine learning models, supervised and unsupervised, are based on simple Euclidean distance or orthogonal projection in a high dimensional feature space. When estimating such models from small training sets we face the problem that the span of the training data set input vectors is not the full input space. Hence, when applying the model to future data the model is effectively blind to the missed orthogonal subspace. This can lead to an inflated variance of hidden variables estimated in the training set and when the model is applied to test data we may find that the hidden variables follow a different probability law with less variance. While the problem and basic means to reconstruct and deflate are well understood in unsupervised learning, the case of supervised learning is less well understood. We here investigate the effect of variance inflation in supervised learning including the case of Support Vector Machines (SVMS) and we propose a non-parametric scheme to restore proper generalizability. We illustrate the algorithm and its ability to restore performance on a wide range of benchmark data sets.