Structured prediction by joint kernel support estimation

Authors:
Christoph H. Lampert;Matthew B. Blaschko
Affiliations:
Max Planck Institute for Biological Cybernetics, Tübingen, Germany;Max Planck Institute for Biological Cybernetics, Tübingen, Germany and Department of Engineering Science, University of Oxford, Oxford, UK
Venue:
Machine Learning
Year:
2009

Citing 24
Cited 3

Adaptive algorithms and stochastic approximations

Adaptive algorithms and stochastic approximations
A comparison of approaches to on-line handwritten character recognition

A comparison of approaches to on-line handwritten character recognition
Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Natural statistical models for automatic speech recognition

Natural statistical models for automatic speech recognition
Kernel conditional random fields: representation and clique selection

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Large Margin Methods for Structured and Interdependent Output Variables

The Journal of Machine Learning Research
Discriminative versus generative parameter and structure learning of Bayesian network classifiers

ICML '05 Proceedings of the 22nd international conference on Machine learning
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Estimating the Support of a High-Dimensional Distribution

Neural Computation
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Kernel Classifiers with Online and Active Learning

The Journal of Machine Learning Research
Learning for efficient retrieval of structured data with noisy queries

Proceedings of the 24th international conference on Machine learning
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
Predicting Structured Data (Neural Information Processing)

Predicting Structured Data (Neural Information Processing)
Large-Scale Kernel Machines (Neural Information Processing)

Large-Scale Kernel Machines (Neural Information Processing)
Training structural SVMs when exact inference is intractable

Proceedings of the 25th international conference on Machine learning
A dual coordinate descent method for large-scale linear SVM

Proceedings of the 25th international conference on Machine learning
Accurate max-margin training for structured output spaces

Proceedings of the 25th international conference on Machine learning
Trust Region Newton Method for Logistic Regression

The Journal of Machine Learning Research
A New Multi-class SVM Algorithm Based on One-Class SVM

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
Maximum margin coresets for active and noise tolerant learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Loopy belief propagation for approximate inference: an empirical study

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
SURF: speeded up robust features

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I

Guest editorial: special issue on structured prediction

Machine Learning
Structured Learning and Prediction in Computer Vision

Foundations and Trends® in Computer Graphics and Vision
Branch&Rank for Efficient Object Detection

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discriminative techniques, such as conditional random fields (CRFs) or structure aware maximum-margin techniques (maximum margin Markov networks (M3N), structured output support vector machines (S-SVM)), are state-of-the-art in the prediction of structured data. However, to achieve good results these techniques require complete and reliable ground truth, which is not always available in realistic problems. Furthermore, training either CRFs or margin-based techniques is computationally costly, because the runtime of current training methods depends not only on the size of the training set but also on properties of the output space to which the training samples are assigned.We propose an alternative model for structured output prediction, Joint Kernel Support Estimation (JKSE), which is rather generative in nature as it relies on estimating the joint probability density of samples and labels in the training set. This makes it tolerant against incomplete or incorrect labels and also opens the possibility of learning in situations where more than one output label can be considered correct.At the same time, we avoid typical problems of generative models as we do not attempt to learn the full joint probability distribution, but we model only its support in a joint reproducing kernel Hilbert space. As a consequence, JKSE training is possible by an adaption of the classical one-class SVM procedure. The resulting optimization problem is convex and efficiently solvable even with tens of thousands of training examples. A particular advantage of JKSE is that the training speed depends only on the size of the training set, and not on the total size of the label space. No inference step during training is required (as M3N and S-SVM would) nor do we have calculate a partition function (as CRFs do).Experiments on realistic data show that, for suitable kernel functions, our method works efficiently and robustly in situations that discriminative techniques have problems with or that are computationally infeasible for them.