The perception: a probabilistic model for information storage and organization in the brain
Neurocomputing: foundations of research
Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
Large Margin Classification Using the Perceptron Algorithm
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Gambling in a rigged casino: The adversarial multi-armed bandit problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Ultraconservative online algorithms for multiclass problems
The Journal of Machine Learning Research
Online convex optimization in the bandit setting: gradient descent without a gradient
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Online multiclass learning by interclass hypothesis sharing
ICML '06 Proceedings of the 23rd international conference on Machine learning
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
A primal-dual perspective of online learning algorithms
Machine Learning
The offset tree for learning with partial labels
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A contextual-bandit approach to personalized news article recommendation
Proceedings of the 19th international conference on World wide web
Exploitation and exploration in a performance based contextual advertising system
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Online learning in adversarial Lipschitz environments
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Learning to trade off between exploration and exploitation in multiclass bandit prediction
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-armed bandits with episode context
Annals of Mathematics and Artificial Intelligence
Learning with stochastic inputs and adversarial outputs
Journal of Computer and System Sciences
Distribution-aware online classifiers
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Content recommendation on web portals
Communications of the ACM
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.02 |
This paper introduces the Banditron, a variant of the Perceptron [Rosenblatt, 1958], for the multiclass bandit setting. The multiclass bandit setting models a wide range of practical supervised learning applications where the learner only receives partial feedback (referred to as "bandit" feedback, in the spirit of multi-armed bandit models) with respect to the true label (e.g. in many web applications users often only provide positive "click" feedback which does not necessarily fully disclose a true label). The Banditron has the ability to learn in a multiclass classification setting with the "bandit" feedback which only reveals whether or not the prediction made by the algorithm was correct or not (but does not necessarily reveal the true label). We provide (relative) mistake bounds which show how the Banditron enjoys favorable performance, and our experiments demonstrate the practicality of the algorithm. Furthermore, this paper pays close attention to the important special case when the data is linearly separable --- a problem which has been exhaustively studied in the full information setting yet is novel in the bandit setting.