Direct and Indirect Algorithms for On-line Learning of Disjunctions

Authors:
David P. Helmbold;Sandra Panizza;Manfred K. Warmuth
Affiliations:
-;-;-
Venue:
EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Year:
1999

Citing 6
Cited 1

Mistake bounds and logarithmic linear-threshold learning algorithms

Mistake bounds and logarithmic linear-threshold learning algorithms
The weighted majority algorithm

Information and Computation
The binary exponentiated gradient algorithm for learning linear functions

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Tracking the Best Disjunction

Machine Learning - Special issue on context sensitivity and concept drift
Linear hinge loss and average margin

Proceedings of the 1998 conference on Advances in neural information processing systems II
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning

Predicting Nearly as well as the best Pruning of a Planar Decision Graph

ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory

Quantified Score

Hi-index	0.01

Visualization

Abstract

It is easy to design on-line learning algorithms for learning k out of n variable monotone disjunctions by simply keeping one weight per disjunction. Such algorithms use roughly O(nk) weights which can be prohibitively expensive. Surprisingly, algorithms like Winnow require only n weights (one per variable) and the mistake bound of these algorithms is not too much worse than the mistake bound of the more costly algorithms. The purpose of this paper is to investigate how the exponentially many weights can be collapsed into only O(n) weights. In particular, we consider probabilistic assumptions that enable the Bayes optimal algorithm's posterior over the disjunctions to be encoded with only O(n) weights. This results in a new O(n) algorithm for learning disjunctions which is related to the Bylander's BEG algorithm originally introduced for linear regression. Beside providing a Bayesian interpretation for this new algorithm, we are also able to obtain mistake bounds for the noise free case resembling those that have been derived for the Winnow algorithm. The same techniques used to derive this new algorithm also provide a Bayesian interpretation for a normalized version of Winnow.