The subspace information criterion for infinite dimensional hypothesis spaces

Authors:
Masashi Sugiyama;Klaus-Robert Müller
Affiliations:
Department of Computer Science, Tokyo Institute of Technology, 2-12-1, O-okayama, Meguro-ku, Tokyo, 152-8552, Japan;Fraunhofer FIRST, IDA, Kekuléstr. 7, 12489 Berlin and Department of Computer Science, University of Potsdam, August-Bebel-Str.89, Haus 4, 14482 Potsdam, Germany
Venue:
The Journal of Machine Learning Research
Year:
2003

Citing 37
Cited 6

Algorithms for clustering data

Algorithms for clustering data
Ten lectures on wavelets

Ten lectures on wavelets
Neural networks and the bias/variance dilemma

Neural Computation
Bayesian interpolation

Neural Computation
Information-based objective functions for active data selection

Neural Computation
Four types of learning curves

Neural Computation
Bayesian regularization and pruning using a Laplace prior

Neural Computation
The nature of statistical learning theory

The nature of statistical learning theory
Self-organizing maps

Self-organizing maps
The connection between regularization operators and support vector kernels

Neural Networks
Bias/variance decompositions for likelihood-based estimators

Neural Computation
An equivalence between sparse approximation and support vector machines

Neural Computation
Advances in kernel methods: support vector learning

Advances in kernel methods: support vector learning
Atomic Decomposition by Basis Pursuit

SIAM Journal on Scientific Computing
Prediction with Gaussian processes: from linear regression to linear prediction and beyond

Learning in graphical models
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Theoretical and Experimental Evaluation of the Subspace Information Criterion

Machine Learning
A unified method for optimizing linear image restoration filters

Signal Processing - Image and Video Coding beyond Standards
Optimal design of regularization term and regularization parameter by subspace information criterion

Neural Networks
Ridge Regression Learning Algorithm in Dual Variables

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Sparse Greedy Matrix Approximation for Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
Subspace Information Criterion for Model Selection

Neural Computation
Algebraic Analysis for Nonidentifiable Learning Machines

Neural Computation
Incremental Active Learning for Optimal Generalization

Neural Computation
New Support Vector Algorithms

Neural Computation
Bounds on Error Expectation for Support Vector Machines

Neural Computation
Active learning with statistical models

Journal of Artificial Intelligence Research
Fisher information and stochastic complexity

IEEE Transactions on Information Theory
A decision-theoretic extension of stochastic complexity and its applications to learning

IEEE Transactions on Information Theory
De-noising by soft-thresholding

IEEE Transactions on Information Theory
Model complexity control for regression using VC generalization bounds

IEEE Transactions on Neural Networks
Statistical active learning in multilayer perceptrons

IEEE Transactions on Neural Networks
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks
Subspace information criterion for nonquadratic regularizers-Model selection for sparse regressors

IEEE Transactions on Neural Networks

Trading variance reduction with unbiasedness: the regularized subspace information criterion for robust model selection in kernel regression

Neural Computation
A signal theory approach to support vector classification: The sinc kernel

Neural Networks
Generalization Error Estimation for Non-linear Learning Methods

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Analytic Optimization of Adaptive Ridge Parameters Based on Regularized Subspace Information Criterion

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
A New Meta-Criterion for Regularized Subspace Information Criterion

IEICE - Transactions on Information and Systems
Kernel Wiener filter and its application to pattern recognition

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

A central problem in learning is selection of an appropriate model. This is typically done by estimating the unknown generalization errors of a set of models to be selected from and then choosing the model with minimal generalization error estimate. In this article, we discuss the problem of model selection and generalization error estimation in the context of kernel regression models, e.g., kernel ridge regression, kernel subset regression or Gaussian process regression. Previously, a non-asymptotic generalization error estimator called the subspace information criterion (SIC) was proposed, that could be successfully applied to finite dimensional subspace models. SIC is an unbiased estimator of the generalization error for the finite sample case under the conditions that the learning target function belongs to a specified reproducing kernel Hilbert space (RKHS) H and the reproducing kernels centered on training sample points span the whole space H. These conditions hold only if dim H l, where l H, SIC is an unbiased estimator of an essential part of the generalization error. Our extension allows the use of any RKHSs including infinite dimensional ones, i.e., richer function classes commonly used in Gaussian processes, support vector machines or boosting. We further show that when the kernel matrix is invertible, SIC can be expressed in a much simpler form, making its computation highly efficient. In computer simulations on ridge parameter selection with real and artificial data sets, SIC is compared favorably with other standard model selection techniques for instance leave-one-out cross-validation or an empirical Bayesian method.