L2 regularization for learning kernels

Authors:
Corinna Cortes;Mehryar Mohri;Afshin Rostamizadeh
Affiliations:
Google Research, New York;Courant Institute and Google Research;New York University
Venue:
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Year:
2009

Citing 14
Cited 14

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Support-Vector Networks

Machine Learning
Ridge Regression Learning Algorithm in Dual Variables

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Stability and generalization

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Learning the Kernel Matrix with Semidefinite Programming

The Journal of Machine Learning Research
Multi-task feature and kernel selection for SVMs

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning the Kernel with Hyperkernels

The Journal of Machine Learning Research
Learning the Kernel Function via Regularization

The Journal of Machine Learning Research
A DC-programming algorithm for kernel selection

ICML '06 Proceedings of the 23rd international conference on Machine learning
Nonstationary kernel combination

ICML '06 Proceedings of the 23rd international conference on Machine learning
Multiclass multiple kernel learning

Proceedings of the 24th international conference on Machine learning
Learning bounds for support vector machines with learned kernels

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Learning convex combinations of continuously parameterized basic kernels

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Wavelet kernel learning

Pattern Recognition
lp-Norm Multiple Kernel Learning

The Journal of Machine Learning Research
Multiple Kernel Learning Algorithms

The Journal of Machine Learning Research
Combining multiple kernels by augmenting the kernel matrix

MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
Non-sparse multiple kernel fisher discriminant analysis

The Journal of Machine Learning Research
Algorithms for learning kernels based on centered alignment

The Journal of Machine Learning Research
Double fusion for multimedia event detection

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Greedy unsupervised multiple kernel learning

SETN'12 Proceedings of the 7th Hellenic conference on Artificial Intelligence: theories and applications
Effective transfer tagging from image to video

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Unsupervised non-parametric kernel learning algorithm

Knowledge-Based Systems
Multiple spectral kernel learning and a gaussian complexity computation

Neural Computation
On the convergence rate of lp-norm multiple kernel learning

The Journal of Machine Learning Research
Alignment based kernel learning with a continuous set of base kernels

Machine Learning
Eigenvalues perturbation of integral operator for kernel selection

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The choice of the kernel is critical to the success of many learning algorithms but it is typically left to the user. Instead, the training data can be used to learn the kernel by selecting it out of a given family, such as that of non-negative linear combinations of p base kernels, constrained by a trace or L1 regularization. This paper studies the problem of learning kernels with the same family of kernels but with an L2 regularization instead, and for regression problems. We analyze the problem of learning kernels with ridge regression. We derive the form of the solution of the optimization problem and give an efficient iterative algorithm for computing that solution. We present a novel theoretical analysis of the problem based on stability and give learning bounds for orthogonal kernels that contain only an additive term O(√p/m) when compared to the standard kernel ridge regression stability bound. We also report the results of experiments indicating that L1 regularization can lead to modest improvements for a small number of kernels, but to performance degradations in larger-scale cases. In contrast, L2 regularization never degrades performance and in fact achieves significant improvements with a large number of kernels.