Trading variance reduction with unbiasedness: the regularized subspace information criterion for robust model selection in kernel regression

Authors:
Masashi Sugiyama;Motoaki Kawanabe;Klaus-Robert Müller
Affiliations:
Fraunhofer FIRST, IDA, 12489 Berlin, Germany, and Department of Computer Science, Tokyo Institute of Technology, Meguro-ku, Tokyo, 152-8552, Japan;Fraunhofer FIRST, IDA, 12489 Berlin, Germany;Fraunhofer FIRST, IDA, 12489 Berlin, Germany, and Department of Computer Science, University of Potsdam, 14482 Potsdam, Germany
Venue:
Neural Computation
Year:
2004

Citing 21
Cited 8

Ten lectures on wavelets

Ten lectures on wavelets
Neural networks and the bias/variance dilemma

Neural Computation
The nature of statistical learning theory

The nature of statistical learning theory
The connection between regularization operators and support vector kernels

Neural Networks
Bias/variance decompositions for likelihood-based estimators

Neural Computation
An equivalence between sparse approximation and support vector machines

Neural Computation
Making large-scale support vector machine learning practical

Advances in kernel methods
Using support vector machines for time series prediction

Advances in kernel methods
Prediction with Gaussian processes: from linear regression to linear prediction and beyond

Learning in graphical models
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Optimal design of regularization term and regularization parameter by subspace information criterion

Neural Networks
Stability and generalization

The Journal of Machine Learning Research
The subspace information criterion for infinite dimensional hypothesis spaces

The Journal of Machine Learning Research
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
Subspace Information Criterion for Model Selection

Neural Computation
New Support Vector Algorithms

Neural Computation
De-noising by soft-thresholding

IEEE Transactions on Information Theory
Model complexity control for regression using VC generalization bounds

IEEE Transactions on Neural Networks
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks
Subspace information criterion for nonquadratic regularizers-Model selection for sparse regressors

IEEE Transactions on Neural Networks

Optimal Kernel in a Class of Kernels with an Invariant Metric

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Analytic Optimization of Adaptive Ridge Parameters Based on Regularized Subspace Information Criterion

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
A New Meta-Criterion for Regularized Subspace Information Criterion

IEICE - Transactions on Information and Systems
Adaptive importance sampling for value function approximation in off-policy reinforcement learning

Neural Networks
Semi-supervised speaker identification under covariate shift

Signal Processing
Semi-supervised learning based on high density region estimation

Neural Networks
Reward-weighted regression with sample reuse for direct policy search in reinforcement learning

Neural Computation
Extended analyses for an optimal kernel in a class of kernels with an invariant metric

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

A well-known result by Stein (1956) shows that in particular situations, biased estimators can yield better parameter estimates than their generally preferred unbiased counterparts. This letter follows the same spirit, as we will stabilize the unbiased generalization error estimates by regularization and finally obtain more robust model selection criteria for learning. We trade a small bias against a larger variance reduction, which has the beneficial effect of being more precise on a single training set. We focus on the subspace information criterion (SIC), which is an unbiased estimator of the expected generalization error measured by the reproducing kernel Hilbert space norm. SIC can be applied to the kernel regression, and it was shown in earlier experiments that a small regularization of SIC has a stabilization effect. However, it remained open how to appropriately determine the degree of regularization in SIC. In this article, we derive an unbiased estimator of the expected squared error, between SIC and the expected generalization error and propose determining the degree of regularization of SIC such that the estimator of the expected squared error is minimized. Computer simulations with artificial and real data sets illustrate that the proposed method works effectively for improving the precision of SIC, especially in the high-noise-level cases. We furthermore compare the proposed method to the original SIC, the cross-validation, and an empirical Bayesian method in ridge parameter selection, with good results.