Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods

Authors:
Matthias W. Seeger
Affiliations:
-
Venue:
The Journal of Machine Learning Research
Year:
2008

Citing 15
Cited 2

Fast exact multiplication by the Hessian

Neural Computation
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Bayesian Classification With Gaussian Processes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
Kernel independent component analysis

The Journal of Machine Learning Research
Convex Optimization

Convex Optimization
Predictive automatic relevance determination by expectation propagation

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Hierarchical document categorization with support vector machines

Proceedings of the thirteenth ACM international conference on Information and knowledge management
A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs

The Journal of Machine Learning Research
Gaussian Processes for Classification: Mean-Field Algorithms

Neural Computation
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Backpropagation applied to handwritten zip code recognition

Neural Computation
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks

Geometry-aware metric learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A survey of hierarchical classification across different application domains

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a highly efficient framework for penalized likelihood kernel methods applied to multi-class models with a large, structured set of classes. As opposed to many previous approaches which try to decompose the fitting problem into many smaller ones, we focus on a Newton optimization of the complete model, making use of model structure and linear conjugate gradients in order to approximate Newton search directions. Crucially, our learning method is based entirely on matrix-vector multiplication primitives with the kernel matrices and their derivatives, allowing straightforward specialization to new kernels, and focusing code optimization efforts to these primitives only. Kernel parameters are learned automatically, by maximizing the cross-validation log likelihood in a gradient-based way, and predictive probabilities are estimated. We demonstrate our approach on large scale text classification tasks with hierarchical structure on thousands of classes, achieving state-of-the-art results in an order of magnitude less time than previous work. Parts of this work appeared in the conference paper Seeger (2007).