The weighted majority algorithm
Information and Computation
Journal of the ACM (JACM)
Some label efficient learning results
COLT '97 Proceedings of the tenth annual conference on Computational learning theory
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Discrete Prediction Games with Arbitrary Feedback and Loss
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
The Value of Knowing a Demand Curve: Bounds on Regret for Online Posted-Price Auctions
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Online convex optimization in the bandit setting: gradient descent without a gradient
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Prediction, Learning, and Games
Prediction, Learning, and Games
Regret Minimization Under Partial Monitoring
Mathematics of Operations Research
Multi-armed bandits in metric spaces
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Strategies for Prediction Under Imperfect Monitoring
Mathematics of Operations Research
Information and Computation
Toward a classification of finite partial-monitoring games
ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Minimizing regret with label efficient prediction
IEEE Transactions on Information Theory
Hi-index | 5.23 |
Partial-monitoring games constitute a mathematical framework for sequential decision making problems with imperfect feedback: the learner repeatedly chooses an action, the opponent responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his total cumulative loss. We make progress toward the classification of these games based on their minimax expected regret. Namely, we classify almost all games with two outcomes and a finite number of actions: we show that their minimax expected regret is either zero, @Q@?(T), @Q(T^2^/^3), or @Q(T), and we give a simple and efficiently computable classification of these four classes of games. Our hope is that the result can serve as a stepping stone toward classifying all finite partial-monitoring games.