Probabilities of discrepancy between minima of cross-validation, Vapnik bounds and true risks

  • Authors:
  • Przemysław Klęsk

  • Affiliations:
  • Department of Methods of Artificial Intelligence and Applied Mathematics, Westpomeranian University of Technology, ul. Żołnierska 49, 71-210 Szczecin, Poland

  • Venue:
  • International Journal of Applied Mathematics and Computer Science
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Two known approaches to complexity selection are taken under consideration: n-fold cross-validation and structural risk minimization. Obviously, in either approach, a discrepancy between the indicated optimal complexity (indicated as the minimum of a generalization error estimate or a bound) and the genuine minimum of unknown true risks is possible. In the paper, this problem is posed in a novel quantitative way. We state and prove theorems demonstrating how one can calculate pessimistic probabilities of discrepancy between these minima for given for given conditions of an experiment. The probabilities are calculated in terms of all relevant constants: the sample size, the number of cross-validation folds, the capacity of the set of approximating functions and bounds on this set. We report experiments carried out to validate the results.