PAC-Bayesian Compression Bounds on the Prediction Error of Learning Algorithms for Classification

Authors:
Thore Graepel;Ralf Herbrich;John Shawe-Taylor
Affiliations:
Microsoft Research Cambridge, UK;Microsoft Research Cambridge, UK;School of Electronics and Computer Science, University of Southampton, UK
Venue:
Machine Learning
Year:
2005

Citing 17
Cited 1

From on-line to batch learning

COLT '89 Proceedings of the second annual workshop on Computational learning theory
The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

Machine Learning
Some PAC-Bayesian theorems

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Generalization performance of support vector machines and other pattern classifiers

Advances in kernel methods
PAC-Bayesian model averaging

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
AI Game Programming Wisdom

AI Game Programming Wisdom
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
On Prediction by Data Compression

ECML '97 Proceedings of the 9th European Conference on Machine Learning
Learning with the Set Covering Machine

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Generalisation Error Bounds for Sparse Linear Classifiers

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
Machine learning with data dependent hypothesis classes

The Journal of Machine Learning Research
Algorithmic luckiness

The Journal of Machine Learning Research
A PAC-Bayesian margin bound for linear classifiers

IEEE Transactions on Information Theory

Support Vector Machinery for Infinite Ensemble Learning

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider bounds on the prediction error of classification algorithms based on sample compression. We refine the notion of a compression scheme to distinguish permutation and repetition invariant and non-permutation and repetition invariant compression schemes leading to different prediction error bounds. Also, we extend known results on compression to the case of non-zero empirical risk.We provide bounds on the prediction error of classifiers returned by mistake-driven online learning algorithms by interpreting mistake bounds as bounds on the size of the respective compression scheme of the algorithm. This leads to a bound on the prediction error of perceptron solutions that depends on the margin a support vector machine would achieve on the same training sample.Furthermore, using the property of compression we derive bounds on the average prediction error of kernel classifiers in the PAC-Bayesian framework. These bounds assume a prior measure over the expansion coefficients in the data-dependent kernel expansion and bound the average prediction error uniformly over subsets of the space of expansion coefficients.