Analysis of some methods for reduced rank gaussian process regression

Authors:
Joaquin Quiñonero-Candela;Carl Edward Rasmussen
Affiliations:
Informatics and Mathematical Modelling, Technical University of Denmark, Kongens Lyngby, Denmark;Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Venue:
Switching and Learning in Feedback Systems
Year:
2003

Citing 7
Cited 2

The bias-variance tradeoff and the randomized GACV

Proceedings of the 1998 conference on Advances in neural information processing systems II
Bayesian Learning for Neural Networks

Bayesian Learning for Neural Networks
Sparse on-line Gaussian processes

Neural Computation
Sparse Greedy Matrix Approximation for Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Evaluation of gaussian processes and other methods for non-linear regression

Evaluation of gaussian processes and other methods for non-linear regression
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
A Bayesian Committee Machine

Neural Computation

The concept for Gaussian process model based system identification toolbox

CompSysTech '07 Proceedings of the 2007 international conference on Computer systems and technologies
Kernels for Vector-Valued Functions: A Review

Foundations and Trends® in Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning the covariance function hyperparameters and the support set. We propose a method for learning hyperparameters for a given support set. We also review the Sparse Greedy GP (SGGP) approximation (Smola and Bartlett, 2001), which is a way of learning the support set for given hyperparameters based on approximating the posterior. We propose an alternative method to the SGGP that has better generalization capabilities. Finally we make experiments to compare the different ways of training a RRGP. We provide some Matlab code for learning RRGPs.