Real and complex analysis, 3rd ed.
Real and complex analysis, 3rd ed.
On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
Making large-scale support vector machine learning practical
Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Dynamically adapting kernels in support vector machines
Proceedings of the 1998 conference on Advances in neural information processing systems II
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Machine Learning
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Choosing Multiple Parameters for Support Vector Machines
Machine Learning
Mustererkennung 1998, 20. DAGM-Symposium
Priors Stabilizers and Basis Functions: From Regularization to Radial, Tensor and Additive Splines
Priors Stabilizers and Basis Functions: From Regularization to Radial, Tensor and Additive Splines
Classes of kernels for machine learning: a statistics perspective
The Journal of Machine Learning Research
Convex Optimization
Learning the Kernel Matrix with Semidefinite Programming
The Journal of Machine Learning Research
Multiple kernel learning, conic duality, and the SMO algorithm
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning the Kernel with Hyperkernels
The Journal of Machine Learning Research
Learning the Kernel Function via Regularization
The Journal of Machine Learning Research
Gradient-Based Adaptation of General Gaussian Kernels
Neural Computation
A DC-programming algorithm for kernel selection
ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics)
Large Scale Multiple Kernel Learning
The Journal of Machine Learning Research
Learnability of Gaussians with Flexible Variances
The Journal of Machine Learning Research
Learning convex combinations of continuously parameterized basic kernels
COLT'05 Proceedings of the 18th annual conference on Learning Theory
An overview of statistical learning theory
IEEE Transactions on Neural Networks
Optimizing the kernel in the empirical feature space
IEEE Transactions on Neural Networks
A geometric approach to Support Vector Machine (SVM) classification
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Appropriate selection of the kernel function, which implicitly defines the feature space of an algorithm, has a crucial role in the success of kernel methods. In this paper, we consider the problem of optimizing a kernel function over the class of translation invariant kernels for the task of binary classification. The learning capacity of this class is invariant with respect to rotation and scaling of the features and it encompasses the set of radial kernels. We show that how translation invariant kernel functions can be embedded in a nested set of sub-classes and consider the kernel learning problem over one of these sub-classes. This allows the choice of an appropriate sub-class based on the problem at hand. We use the criterion proposed by Lanckriet et al. (2004) to obtain a functional formulation for the problem. It will be proven that the optimal kernel is a finite mixture of cosine functions. The kernel learning problem is then formulated as a semi-infinite programming (SIP) problem which is solved by a sequence of quadratically constrained quadratic programming (QCQP) sub-problems. Using the fact that the cosine kernel is of rank two, we propose a formulation of a QCQP sub-problem which does not require the kernel matrices to be loaded into memory, making the method applicable to large-scale problems. We also address the issue of including other classes of kernels, such as individual kernels and isotropic Gaussian kernels, in the learning process. Another interesting feature of the proposed method is that the optimal classifier has an expansion in terms of the number of cosine kernels, instead of support vectors, leading to a remarkable speedup at run-time. As a by-product, we also generalize the kernel trick to complex-valued kernel functions. Our experiments on artificial and real-world benchmark data sets, including the USPS and the MNIST digit recognition data sets, show the usefulness of the proposed method.