Efficient bandit algorithms for online multiclass prediction

Authors:
Sham M. Kakade;Shai Shalev-Shwartz;Ambuj Tewari
Affiliations:
Toyota Technological Institute, Chicago, Illinois;Toyota Technological Institute, Chicago, Illinois;Toyota Technological Institute, Chicago, Illinois
Venue:
Proceedings of the 25th international conference on Machine learning
Year:
2008

Citing 11
Cited 11

The perception: a probabilistic model for information storage and organization in the brain

Neurocomputing: foundations of research
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
Gambling in a rigged casino: The adversarial multi-armed bandit problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Ultraconservative online algorithms for multiclass problems

The Journal of Machine Learning Research
Online convex optimization in the bandit setting: gradient descent without a gradient

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
An algorithmic theory of learning: Robust concepts and random projection

Machine Learning
Online multiclass learning by interclass hypothesis sharing

ICML '06 Proceedings of the 23rd international conference on Machine learning
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
A primal-dual perspective of online learning algorithms

Machine Learning

The offset tree for learning with partial labels

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A contextual-bandit approach to personalized news article recommendation

Proceedings of the 19th international conference on World wide web
Exploitation and exploration in a performance based contextual advertising system

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Online learning in adversarial Lipschitz environments

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Learning to trade off between exploration and exploitation in multiclass bandit prediction

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-armed bandits with episode context

Annals of Mathematics and Artificial Intelligence
Learning with stochastic inputs and adversarial outputs

Journal of Computer and System Sciences
Distribution-aware online classifiers

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Multiclass classification with bandit feedback using adaptive regularization

Machine Learning
Content recommendation on web portals

Communications of the ACM
Upper confidence weighted learning for efficient exploration in multiclass prediction with binary feedback

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper introduces the Banditron, a variant of the Perceptron [Rosenblatt, 1958], for the multiclass bandit setting. The multiclass bandit setting models a wide range of practical supervised learning applications where the learner only receives partial feedback (referred to as "bandit" feedback, in the spirit of multi-armed bandit models) with respect to the true label (e.g. in many web applications users often only provide positive "click" feedback which does not necessarily fully disclose a true label). The Banditron has the ability to learn in a multiclass classification setting with the "bandit" feedback which only reveals whether or not the prediction made by the algorithm was correct or not (but does not necessarily reveal the true label). We provide (relative) mistake bounds which show how the Banditron enjoys favorable performance, and our experiments demonstrate the practicality of the algorithm. Furthermore, this paper pays close attention to the important special case when the data is linearly separable --- a problem which has been exhaustively studied in the full information setting yet is novel in the bandit setting.