Sample compression, margins and generalization: extensions to the set covering machine

Authors:
Mohak Shah
Affiliations:
University of Ottawa (Canada)
Venue:
Sample compression, margins and generalization: extensions to the set covering machine
Year:
2006

Citing 0
Cited 1

Sample compression bounds for decision trees

Proceedings of the 24th international conference on Machine learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This thesis studies the generalization behavior of algorithms in Sample Compression Settings. It extends the study of the Sample Compression framework to derive data-dependent bounds that give tighter guarantees to the algorithms where data-independent bounds such as the VC bounds are not applicable. It also studies the interplay between sparsity and the separating margin, of the classifier and shows how new compression based data-dependent bounds can be obtained that can exploit these two quantities explicitly. These bounds not only provide tight generalization guarantees but by themselves present optimization problems for learning leading to novel learning algorithms. This thesis studies the algorithms based on learning conjunctions or disjunctions of data-dependent Boolean features. With the Set Covering Machine (SCM) as its basis, the thesis shows how novel learning algorithms can be designed in compression settings that can perform a non-trivial margin-sparsity trade-off to yield better classifiers. Moreover, the thesis also shows how feature-selection can be integrated with the learning process in these settings yielding algorithms that not only perform successful feature selection but also have provable theoretical guarantees. In particular, the thesis proposes two novel learning algorithms. The first algorithm is for the SCM with data-dependent half-spaces along with a tight compression bound that can successfully perform model selection. The second algorithm aims at learning conjunctions of features called data-dependent Rays to classify gene expression data from DNA microarrays. The thesis shows how a PAC-Bayes approach to learning Rays' conjunctions can perform a non-trivial margin-sparsity trade-off to achieve classifiers that not only have provable theoretical guarantees but also utilize a significantly small number of attributes unlike traditional feature selection algorithms. This thesis also proposes two new formulations for the classical SCM algorithm with data-dependent balls aimed at performing margin-sparsity trade-off by utilizing Occam's Razor and PAC-Bayes principles respectively. The thesis shows how such approaches yield more general classifiers with tight risk bounds that can potentially guide the model selection process.