Elements of information theory
Elements of information theory
Generalization performance of support vector machines and other pattern classifiers
Advances in kernel methods
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Machine Learning
Sparsity vs. Large Margins for Linear Classifiers
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Bounds on Error Expectation for Support Vector Machines
Neural Computation
The minimum description length principle in coding and modeling
IEEE Transactions on Information Theory
Iterative Kernel Principal Component Analysis for Image Modeling
IEEE Transactions on Pattern Analysis and Machine Intelligence
Nearly Uniform Validation Improves Compression-Based Error Bounds
The Journal of Machine Learning Research
Least conservative support and tolerance tubes
IEEE Transactions on Information Theory
Minimum description length principle in the field of image analysis and pattern recognition
Pattern Recognition and Image Analysis
Margin-sparsity trade-off for the set covering machine
ECML'05 Proceedings of the 16th European conference on Machine Learning
Unlabeled compression schemes for maximum classes
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Principle of representational minimum description length in image analysis and pattern recognition
Pattern Recognition and Image Analysis
A geometric approach to sample compression
The Journal of Machine Learning Research
Eigenvalues perturbation of integral operator for kernel selection
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Information Processing Letters
Optimized dissimilarity space embedding for labeled graphs
Information Sciences: an International Journal
Hi-index | 0.07 |
In this paper we investigate connections between statistical learning theory and data compression on the basis of support vector machine (SVM) model selection. Inspired by several generalization bounds we construct "compression coefficients" for SVMs which measure the amount by which the training labels can be compressed by a code built from the separating hyperplane. The main idea is to relate the coding precision to geometrical concepts such as the width of the margin or the shape of the data in the feature space. The so derived compression coefficients combine well known quantities such as the radius-margin term R2/ρ2, the eigenvalues of the kernel matrix, and the number of support vectors. To test whether they are useful in practice we ran model selection experiments on benchmark data sets. As a result we found that compression coefficients can fairly accurately predict the parameters for which the test error is minimized.