Fast exact multiplication by the Hessian
Neural Computation
Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Bayesian Classification With Gaussian Processes
IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
On the algorithmic implementation of multiclass kernel-based vector machines
The Journal of Machine Learning Research
Kernel independent component analysis
The Journal of Machine Learning Research
Convex Optimization
Predictive automatic relevance determination by expectation propagation
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Hierarchical document categorization with support vector machines
Proceedings of the thirteenth ACM international conference on Information and knowledge management
A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs
The Journal of Machine Learning Research
Gaussian Processes for Classification: Mean-Field Algorithms
Neural Computation
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Backpropagation applied to handwritten zip code recognition
Neural Computation
A comparison of methods for multiclass support vector machines
IEEE Transactions on Neural Networks
Geometry-aware metric learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A survey of hierarchical classification across different application domains
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
We propose a highly efficient framework for penalized likelihood kernel methods applied to multi-class models with a large, structured set of classes. As opposed to many previous approaches which try to decompose the fitting problem into many smaller ones, we focus on a Newton optimization of the complete model, making use of model structure and linear conjugate gradients in order to approximate Newton search directions. Crucially, our learning method is based entirely on matrix-vector multiplication primitives with the kernel matrices and their derivatives, allowing straightforward specialization to new kernels, and focusing code optimization efforts to these primitives only. Kernel parameters are learned automatically, by maximizing the cross-validation log likelihood in a gradient-based way, and predictive probabilities are estimated. We demonstrate our approach on large scale text classification tasks with hierarchical structure on thousands of classes, achieving state-of-the-art results in an order of magnitude less time than previous work. Parts of this work appeared in the conference paper Seeger (2007).