SPF-GMKL: generalized multiple kernel learning with a million kernels

Authors:
Ashesh Jain;S.V.N. Vishwanathan;Manik Varma
Affiliations:
Indian Institute of Technology Delhi, New Delhi, India;Purdue University, West Lafayette, IN, USA;Microsoft Research India, Bengaluru, India
Venue:
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2012

Citing 17
Cited 1

A nonmonotone line search technique for Newton's method

SIAM Journal on Numerical Analysis
The Barzilai and Borwein Gradient Method for the Large Scale Unconstrained Minimization Problem

SIAM Journal on Optimization
Nonmonotone Spectral Projected Gradient Methods on Convex Sets

SIAM Journal on Optimization
Choosing Multiple Parameters for Support Vector Machines

Machine Learning
A Nonmonotone Line Search Technique and Its Application to Unconstrained Optimization

SIAM Journal on Optimization
Learning the Kernel Matrix with Semidefinite Programming

The Journal of Machine Learning Research
Multiple kernel learning, conic duality, and the SMO algorithm

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning the Kernel with Hyperkernels

The Journal of Machine Learning Research
Large Scale Multiple Kernel Learning

The Journal of Machine Learning Research
Multiclass multiple kernel learning

Proceedings of the 24th international conference on Machine learning
Localized multiple kernel learning

Proceedings of the 25th international conference on Machine learning
Multi-class Discriminant Kernel Learning via Convex Programming

The Journal of Machine Learning Research
Learning subspace kernels for classification

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
More generality in efficient multiple kernel learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Variable Sparsity Kernel Learning

The Journal of Machine Learning Research
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Efficient hyperkernel learning using second-order cone programming

IEEE Transactions on Neural Networks

Online multi-modal distance learning for scalable multimedia retrieval

Proceedings of the sixth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multiple Kernel Learning (MKL) aims to learn the kernel in an SVM from training data. Many MKL formulations have been proposed and some have proved effective in certain applications. Nevertheless, as MKL is a nascent field, many more formulations need to be developed to generalize across domains and meet the challenges of real world applications. However, each MKL formulation typically necessitates the development of a specialized optimization algorithm. The lack of an efficient, general purpose optimizer capable of handling a wide range of formulations presents a significant challenge to those looking to take MKL out of the lab and into the real world. This problem was somewhat alleviated by the development of the Generalized Multiple Kernel Learning (GMKL) formulation which admits fairly general kernel parameterizations and regularizers subject to mild constraints. However, the projected gradient descent GMKL optimizer is inefficient as the computation of the step size and a reasonably accurate objective function value or gradient direction are all expensive. We overcome these limitations by developing a Spectral Projected Gradient (SPG) descent optimizer which: a) takes into account second order information in selecting step sizes; b) employs a non-monotone step size selection criterion requiring fewer function evaluations; c) is robust to gradient noise, and d) can take quick steps when far away from the optimum. We show that our proposed SPG-GMKL optimizer can be an order of magnitude faster than projected gradient descent on even small and medium sized datasets. In some cases, SPG-GMKL can even outperform state-of-the-art specialized optimization algorithms developed for a single MKL formulation. Furthermore, we demonstrate that SPG-GMKL can scale well beyond gradient descent to large problems involving a million kernels or half a million data points. Our code and implementation are available publically.