Reliability of Cross-Validation for SVMs in High-Dimensional, Low Sample Size Scenarios

Authors:
Sascha Klement;Amir Madany Mamlouk;Thomas Martinetz
Affiliations:
Institute for Neuro- and Bioinformatics, University of Lübeck,;Institute for Neuro- and Bioinformatics, University of Lübeck,;Institute for Neuro- and Bioinformatics, University of Lübeck,
Venue:
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Year:
2008

Citing 4
Cited 2

Generalization performance of support vector machines and other pattern classifiers

Advances in kernel methods
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
SoftDoubleMinOver: a simple procedure for maximum margin classification

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II

Event monitoring via local motion abnormality detection in non-linear subspace

Neurocomputing
Evolutionary Generalized Radial Basis Function neural networks for improving prediction accuracy in gene classification using feature selection

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A Support-Vector-Machine (SVM) learns for given 2-class-data a classifier that tries to achieve good generalisation by maximising the minimal margin between the two classes. The performance can be evaluated using cross-validation testing strategies. But in case of low sample size data, high dimensionality might lead to strong side-effects that can significantly bias the estimated performance of the classifier. On simulated data, we illustrate the effects of high dimensionality for cross-validation of both hard- and soft-margin SVMs. Based on the theoretical proofs towards infinity we derive heuristics that can be easily used to validate whether or not given data sets are subject to these constraints.