Regression with input-dependent noise: a Gaussian process treatment
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Prediction with Gaussian processes: from linear regression to linear prediction and beyond
Learning in graphical models
Learning curves for Gaussian processes
Proceedings of the 1998 conference on Advances in neural information processing systems II
Upper and Lower Bounds on the Learning Curve for Gaussian Processes
Machine Learning
Learning curves for Gaussian process regression: approximations and bounds
Neural Computation
Learning Curves for Gaussian Processes Models: Fluctuations and Universality
ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Hi-index | 0.00 |
Learning curves for Gaussian process (GP) regression can be strongly affected by a mismatch between the ‘student' model and the ‘teacher' (true data generation process), exhibiting e.g. multiple overfitting maxima and logarithmically slow learning. I investigate whether GPs can be made robust against such effects by adapting student model hyperparameters to maximize the evidence (data likelihood). An approximation for the average evidence is derived and used to predict the optimal hyperparameter values and the resulting generalization error. For large input space dimension, where the approximation becomes exact, Bayes-optimal performance is obtained at the evidence maximum, but the actual hyperparameters (e.g. the noise level) do not necessarily reflect the properties of the teacher. Also, the theoretically achievable evidence maximum cannot always be reached with the chosen set of hyperparameters, and maximizing the evidence in such cases can actually make generalization performance worse rather than better. In lower-dimensional learning scenarios, the theory predicts—in excellent qualitative and good quantitative accord with simulations—that evidence maximization eliminates logarithmically slow learning and recovers the optimal scaling of the decrease of generalization error with training set size.