Structural risk minimization over data-dependent hierarchies

Authors:
J. Shawe-Taylor;P. L. Bartlett;R. C. Williamson;M. Anthony
Affiliations:
Dept. of Comput. Sci., London Univ.;-;-;-
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 61

A PAC analysis of a Bayesian estimator

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Cross-validation for binary classification by real-valued functions: theoretical analysis

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Covering numbers for support vector machines

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
On the VC Dimension of Bounded Margin Classifiers

Machine Learning
Editorial: Kernel Methods: Current Research and Future Directions

Machine Learning
Generalization Ability of Folding Networks

IEEE Transactions on Knowledge and Data Engineering
Mathematical Modelling of Generalization

WIRN VIETRI 2002 Proceedings of the 13th Italian Workshop on Neural Nets-Revised Papers
From Computational Learning Theory to Discovery Science

ICAL '99 Proceedings of the 26th International Colloquium on Automata, Languages and Programming
On the Generalization Ability of Recurrent Networks

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Margin Distribution Bounds on Generalization

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Entropy Numbers, Operators and Support Vector Kernels

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
A Note on the Generalization Performance of Kernel Classifiers with Margin

ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
Geometric Bounds for Generalization in Boosting

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Data-Dependent Margin-Based Generalization Bounds for Classification

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Generalization Performance of Classifiers in Terms of Observed Covering Numbers

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
An introduction to boosting and leveraging

Advanced lectures on machine learning
References

Intelligent data analysis
Bayes point machines

The Journal of Machine Learning Research
On the influence of the kernel on the consistency of support vector machines

The Journal of Machine Learning Research
Covering number bounds of certain regularized linear function classes

The Journal of Machine Learning Research
A new approximate maximal margin classification algorithm

The Journal of Machine Learning Research
Data-dependent margin-based generalization bounds for classification

The Journal of Machine Learning Research
Algorithmic luckiness

The Journal of Machine Learning Research
Pac-bayesian generalisation error bounds for gaussian process classification

The Journal of Machine Learning Research
Rademacher and gaussian complexities: risk bounds and structural results

The Journal of Machine Learning Research
The set covering machine

The Journal of Machine Learning Research
Smooth boosting and learning with malicious noise

The Journal of Machine Learning Research
Generalization error bounds for Bayesian mixture algorithms

The Journal of Machine Learning Research
A Support Vector Machine with a Hybrid Kernel and Minimal Vapnik-Chervonenkis Dimension

IEEE Transactions on Knowledge and Data Engineering
On the Importance of Small Coordinate Projections

The Journal of Machine Learning Research
Support Vector Machine Soft Margin Classifiers: Error Analysis

The Journal of Machine Learning Research
Explanation-Augmented SVM: an approach to incorporating domain knowledge into SVM learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Bounds on Error Expectation for Support Vector Machines

Neural Computation
Learnability of Gaussians with Flexible Variances

The Journal of Machine Learning Research
A discriminative framework for clustering via similarity functions

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
VC Theory of Large Margin Multi-Category Classifiers

The Journal of Machine Learning Research
Asymmetric support vector machines: low false-positive learning under the user tolerance

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimum Neural Network Construction Via Linear Programming Minimum Sphere Set Covering

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
On Relevant Dimensions in Kernel Feature Spaces

The Journal of Machine Learning Research
Exploring Margin Maximization for Biometric Score Fusion

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Towards a Linear Combination of Dichotomizers by Margin Maximization

ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
Binarized Support Vector Machines

INFORMS Journal on Computing
Distribution-dependent PAC-bayes priors

ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
PAC-Bayesian Analysis of Co-clustering and Beyond

The Journal of Machine Learning Research
Maximum margin decision surfaces for increased generalisation in evolutionary decision tree learning

EuroGP'11 Proceedings of the 14th European conference on Genetic programming
A parallel genetic algorithm for solving the inverse problem of support vector machines

ICMLC'05 Proceedings of the 4th international conference on Advances in Machine Learning and Cybernetics
Rotational prior knowledge for SVMs

ECML'05 Proceedings of the 16th European conference on Machine Learning
A PAC-Style model for learning from labeled and unlabeled data

COLT'05 Proceedings of the 18th annual conference on Learning Theory
Generalization behaviour of alkemic decision trees

ILP'05 Proceedings of the 15th international conference on Inductive Logic Programming
Random projection, margins, kernels, and feature-selection

SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
Combining multiple classifiers to quantitatively rank the impact of abnormalities in flight data

Applied Soft Computing
A review of optimization methodologies in support vector machines

Neurocomputing
Optimization the initial weights of artificial neural networks via genetic algorithm applied to hip bone fracture prediction

Advances in Fuzzy Systems - Special issue on Hybrid Biomedical Intelligent Systems
Robust classifier learning with fuzzy class labels for large-margin support vector machines

Neurocomputing
ALFRED: crowd assisted data extraction

Proceedings of the 22nd international conference on World Wide Web companion
A framework for learning web wrappers from the crowd

Proceedings of the 22nd international conference on World Wide Web
PAC-bayes bounds with data dependent priors

The Journal of Machine Learning Research
Learning Big (Image) Data via Coresets for Dictionaries

Journal of Mathematical Imaging and Vision
Efficient regression in metric spaces via approximate lipschitz extension

SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition
Learning bounds via sample width for classifiers on finite metric spaces

Theoretical Computer Science

Quantified Score

Hi-index	754.84

Visualization

Abstract

The paper introduces some generalizations of Vapnik's (1982) method of structural risk minimization (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to trade off errors on the training sample against improved generalization performance. It then considers the more general case when the hierarchy of classes is chosen in response to the data. A result is presented on the generalization performance of classifiers with a “large margin”. This theoretically explains the impressive generalization performance of the maximal margin hyperplane algorithm of Vapnik and co-workers (which is the basis for their support vector machines). The paper concludes with a more general result in terms of “luckiness” functions, which provides a quite general way for exploiting serendipitous simplicity in observed data to obtain better prediction accuracy from small training sets. Four examples are given of such functions, including the Vapnik-Chervonenkis (1971) dimension measured on the sample