A DC-programming algorithm for kernel selection

Authors:
Andreas Argyriou;Raphael Hauser;Charles A. Micchelli;Massimiliano Pontil
Affiliations:
University College London, London, UK;Oxford University Computing Laboratory, Oxford, UK;State University of New York, The University at Albany, Albany, NY;University College London, London, UK
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 7
Cited 24

DC programming: overview

Journal of Optimization Theory and Applications
Choosing Multiple Parameters for Support Vector Machines

Machine Learning
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Learning the Kernel Matrix with Semidefinite Programming

The Journal of Machine Learning Research
Multiple kernel learning, conic duality, and the SMO algorithm

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Learning convex combinations of continuously parameterized basic kernels

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Universal Kernels

The Journal of Machine Learning Research
Discriminant kernel and regularization parameter learning via semidefinite programming

Proceedings of the 24th international conference on Machine learning
Nonlinear adaptive distance metric learning for clustering

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning the kernel matrix in discriminant analysis via quadratically constrained quadratic programming

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-class Discriminant Kernel Learning via Convex Programming

The Journal of Machine Learning Research
Regularization Paths for ν-SVM and ν-SVR

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks, Part III
Building sparse multiple-kernel SVM classifiers

IEEE Transactions on Neural Networks
Classification with Gaussians and Convex Loss

The Journal of Machine Learning Research
Analysis of the distance between two classes for tuning SVM hyperparameters

IEEE Transactions on Neural Networks
Representation of a fisher criterion function in a kernel feature space

IEEE Transactions on Neural Networks
L2 regularization for learning kernels

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Learning Translation Invariant Kernels for Classification

The Journal of Machine Learning Research
On numerical optimization theory of infinite kernel learning

Journal of Global Optimization
Solving structured sparsity regularization with proximal methods

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Wavelet kernel learning

Pattern Recognition
Multiple Kernel Learning Algorithms

The Journal of Machine Learning Research
Gaussian kernel optimization: Complex problem and a simple solution

Neurocomputing
Forecasting foreign exchange rates using kernel methods

Expert Systems with Applications: An International Journal
Algorithms for learning kernels based on centered alignment

The Journal of Machine Learning Research
A review of optimization methodologies in support vector machines

Neurocomputing
Online Multiple Kernel Classification

Machine Learning
Learning with infinitely many features

Machine Learning
Regularized bundle methods for convex and non-convex risks

The Journal of Machine Learning Research
Alignment based kernel learning with a continuous set of base kernels

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of learning a kernel for a given supervised learning task. Our approach consists in searching within the convex hull of a prescribed set of basic kernels for one which minimizes a convex regularization functional. A unique feature of this approach compared to others in the literature is that the number of basic kernels can be infinite. We only require that they are continuously parameterized. For example, the basic kernels could be isotropic Gaussians with variance in a prescribed interval or even Gaussians parameterized by multiple continuous parameters. Our work builds upon a formulation involving a minimax optimization problem and a recently proposed greedy algorithm for learning the kernel. Although this optimization problem is not convex, it belongs to the larger class of DC (difference of convex functions) programs. Therefore, we apply recent results from DC optimization theory to create a new algorithm for learning the kernel. Our experimental results on benchmark data sets show that this algorithm outperforms a previously proposed method.