The bias-variance tradeoff and the randomized GACV
Proceedings of the 1998 conference on Advances in neural information processing systems II
Bayesian Learning for Neural Networks
Bayesian Learning for Neural Networks
Sparse on-line Gaussian processes
Neural Computation
Sparse Greedy Matrix Approximation for Machine Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Evaluation of gaussian processes and other methods for non-linear regression
Evaluation of gaussian processes and other methods for non-linear regression
Sparse bayesian learning and the relevance vector machine
The Journal of Machine Learning Research
Neural Computation
The concept for Gaussian process model based system identification toolbox
CompSysTech '07 Proceedings of the 2007 international conference on Computer systems and technologies
Kernels for Vector-Valued Functions: A Review
Foundations and Trends® in Machine Learning
Hi-index | 0.00 |
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning the covariance function hyperparameters and the support set. We propose a method for learning hyperparameters for a given support set. We also review the Sparse Greedy GP (SGGP) approximation (Smola and Bartlett, 2001), which is a way of learning the support set for given hyperparameters based on approximating the posterior. We propose an alternative method to the SGGP that has better generalization capabilities. Finally we make experiments to compare the different ways of training a RRGP. We provide some Matlab code for learning RRGPs.