Learning Translation Invariant Kernels for Classification

Authors:
Kamaledin Ghiasi-Shirazi;Reza Safabakhsh;Mostafa Shamsi
Affiliations:
-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2010

Citing 28
Cited 1

Real and complex analysis, 3rd ed.

Real and complex analysis, 3rd ed.
On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
Semi-infinite programming: theory, methods, and applications

SIAM Review
Making large-scale support vector machine learning practical

Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Improving support vector machine classifiers by modifying kernal functions

Neural Networks
Dynamically adapting kernels in support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Soft Margins for AdaBoost

Machine Learning
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Choosing Multiple Parameters for Support Vector Machines

Machine Learning
Fast Approximation of Support Vector Kernel Expansions, and an Interpretation of Clustering as Approximation in Feature Spaces

Mustererkennung 1998, 20. DAGM-Symposium
Priors Stabilizers and Basis Functions: From Regularization to Radial, Tensor and Additive Splines

Priors Stabilizers and Basis Functions: From Regularization to Radial, Tensor and Additive Splines
Classes of kernels for machine learning: a statistics perspective

The Journal of Machine Learning Research
Convex Optimization

Convex Optimization
Learning the Kernel Matrix with Semidefinite Programming

The Journal of Machine Learning Research
Multiple kernel learning, conic duality, and the SMO algorithm

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning the Kernel with Hyperkernels

The Journal of Machine Learning Research
Learning the Kernel Function via Regularization

The Journal of Machine Learning Research
Gradient-Based Adaptation of General Gaussian Kernels

Neural Computation
A DC-programming algorithm for kernel selection

ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics)

Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics)
Large Scale Multiple Kernel Learning

The Journal of Machine Learning Research
Learnability of Gaussians with Flexible Variances

The Journal of Machine Learning Research
Learning convex combinations of continuously parameterized basic kernels

COLT'05 Proceedings of the 18th annual conference on Learning Theory
An overview of statistical learning theory

IEEE Transactions on Neural Networks
Optimizing the kernel in the empirical feature space

IEEE Transactions on Neural Networks
A geometric approach to Support Vector Machine (SVM) classification

IEEE Transactions on Neural Networks

Gaussian kernel optimization: Complex problem and a simple solution

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Appropriate selection of the kernel function, which implicitly defines the feature space of an algorithm, has a crucial role in the success of kernel methods. In this paper, we consider the problem of optimizing a kernel function over the class of translation invariant kernels for the task of binary classification. The learning capacity of this class is invariant with respect to rotation and scaling of the features and it encompasses the set of radial kernels. We show that how translation invariant kernel functions can be embedded in a nested set of sub-classes and consider the kernel learning problem over one of these sub-classes. This allows the choice of an appropriate sub-class based on the problem at hand. We use the criterion proposed by Lanckriet et al. (2004) to obtain a functional formulation for the problem. It will be proven that the optimal kernel is a finite mixture of cosine functions. The kernel learning problem is then formulated as a semi-infinite programming (SIP) problem which is solved by a sequence of quadratically constrained quadratic programming (QCQP) sub-problems. Using the fact that the cosine kernel is of rank two, we propose a formulation of a QCQP sub-problem which does not require the kernel matrices to be loaded into memory, making the method applicable to large-scale problems. We also address the issue of including other classes of kernels, such as individual kernels and isotropic Gaussian kernels, in the learning process. Another interesting feature of the proposed method is that the optimal classifier has an expansion in terms of the number of cosine kernels, instead of support vectors, leading to a remarkable speedup at run-time. As a by-product, we also generalize the kernel trick to complex-valued kernel functions. Our experiments on artificial and real-world benchmark data sets, including the USPS and the MNIST digit recognition data sets, show the usefulness of the proposed method.