Reliability of Cross-Validation for SVMs in High-Dimensional, Low Sample Size Scenarios

  • Authors:
  • Sascha Klement;Amir Madany Mamlouk;Thomas Martinetz

  • Affiliations:
  • Institute for Neuro- and Bioinformatics, University of Lübeck,;Institute for Neuro- and Bioinformatics, University of Lübeck,;Institute for Neuro- and Bioinformatics, University of Lübeck,

  • Venue:
  • ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A Support-Vector-Machine (SVM) learns for given 2-class-data a classifier that tries to achieve good generalisation by maximising the minimal margin between the two classes. The performance can be evaluated using cross-validation testing strategies. But in case of low sample size data, high dimensionality might lead to strong side-effects that can significantly bias the estimated performance of the classifier. On simulated data, we illustrate the effects of high dimensionality for cross-validation of both hard- and soft-margin SVMs. Based on the theoretical proofs towards infinity we derive heuristics that can be easily used to validate whether or not given data sets are subject to these constraints.