Algorithmic luckiness

Authors:
Ralf Herbrich;Robert C. Williamson
Affiliations:
Microsoft Research, 7 J J Thomson Avenue, CB3 0FB Cambridge, United Kingdom;National ICT Australia, Research School of Information Sciences and Engineering, Australian National University, Canberra, ACT 0200, Australia
Venue:
The Journal of Machine Learning Research
Year:
2003

Citing 17
Cited 11

Optimal algorithms for approximate clustering

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Learnability and the Vapnik-Chervonenkis dimension

Journal of the ACM (JACM)
A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A result of Vapnik with applications

Discrete Applied Mathematics
An introduction to computational learning theory

An introduction to computational learning theory
The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

Machine Learning
Uniform approximation by neural networks

Journal of Approximation Theory
A sharp concentration inequality with application

Random Structures & Algorithms
AI Game Programming Wisdom

AI Game Programming Wisdom
Generalisation Error Bounds for Sparse Linear Classifiers

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Clustering Motion

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
Bayes point machines

The Journal of Machine Learning Research
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
Structural risk minimization over data-dependent hierarchies

IEEE Transactions on Information Theory

An introduction to boosting and leveraging

Advanced lectures on machine learning
The set covering machine

The Journal of Machine Learning Research
On the Importance of Small Coordinate Projections

The Journal of Machine Learning Research
PAC-Bayesian Compression Bounds on the Prediction Error of Learning Algorithms for Classification

Machine Learning
The cross entropy method for classification

ICML '05 Proceedings of the 22nd international conference on Machine learning
Simpler knowledge-based support vector machines

ICML '06 Proceedings of the 23rd international conference on Machine learning
Exploiting Cluster-Structure to Predict the Labeling of a Graph

ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Exact combinatorial bounds on the probability of overfitting for empirical risk minimization

Pattern Recognition and Image Analysis
The Sample Complexity of Dictionary Learning

The Journal of Machine Learning Research
Controlling sparseness in non-negative tensor factorization

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Reverse-Convex programming for sparse image codes

EMMCVPR'05 Proceedings of the 5th international conference on Energy Minimization Methods in Computer Vision and Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classical statistical learning theory studies the generalisation performance of machine learning algorithms rather indirectly. One of the main detours is that algorithms are studied in terms of the hypothesis class that they draw their hypotheses from. In this paper, motivated by the luckiness framework of Shawe-Taylor et al. (1998), we study learning algorithms more directly and in a way that allows us to exploit the serendipity of the training sample. The main difference to previous approaches lies in the complexity measure; rather than covering all hypotheses in a given hypothesis space it is only necessary to cover the functions which could have been learned using the fixed learning algorithm. We show how the resulting framework relates to the VC, luckiness and compression frameworks. Finally, we present an application of this framework to the maximum margin algorithm for linear classifiers which results in a bound that exploits the margin, the sparsity of the resultant weight vector, and the degree of clustering of the training data in feature space.