Using unsupervised analysis to constrain generalization bounds for support vector classifiers

  • Authors:
  • Sergio Decherchi;Sandro Ridella;Rodolfo Zunino;Paolo Gastaldo;Davide Anguita

  • Affiliations:
  • Department of Biophysical and Electronics Engineering, Genoa University, Genoa, Italy;Department of Biophysical and Electronics Engineering, Genoa University, Genoa, Italy;Department of Biophysical and Electronics Engineering, Genoa University, Genoa, Italy;Department of Biophysical and Electronics Engineering, Genoa University, Genoa, Italy;Department of Biophysical and Electronics Engineering, Genoa University, Genoa, Italy

  • Venue:
  • IEEE Transactions on Neural Networks
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A crucial issue in designing learning machines is to select the correct model parameters. When the number of available samples is small, theoretical sample-based generalization bounds can prove effective, provided that they are tight and track the validation error correctly. The maximal discrepancy (MD) approach is a very promising technique for model selection for support vector machines (SVM), and estimates a classifier's generalization performance by multiple training cycles on random labeled data. This paper presents a general method to compute the generalization bounds for SVMs, which is based on referring the SVM parameters to an unsupervised solution, and shows that such an approach yields tight bounds and attains effective model selection. When one estimates the generalization error, one uses an unsupervised reference to constrain the complexity of the learning machine, thereby possibly decreasing sharply the number of admissible hypothesis. Although the methodology has a general value, the method described in the paper adopts vector quantization (VQ) as a representation paradigm, and introduces a biased regularization approach in bound computation and learning. Experimental results validate the proposed method on complex real-world data sets.