Toward a classification of finite partial-monitoring games

Authors:
András Antos;Gábor Bartók;Dávid Pál;Csaba Szepesvári
Affiliations:
Machine Learning Group, Computer and Automation Research Institute of the Hungarian Academy of Sciences, 13-17 Kende utca, H-1111 Budapest, Hungary;Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G 2E8, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G 2E8, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G 2E8, Canada
Venue:
Theoretical Computer Science
Year:
2013

Citing 15
Cited 0

The weighted majority algorithm

Information and Computation
How to use expert advice

Journal of the ACM (JACM)
Some label efficient learning results

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Discrete Prediction Games with Arbitrary Feedback and Loss

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
The Value of Knowing a Demand Curve: Bounds on Regret for Online Posted-Price Auctions

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Online convex optimization in the bandit setting: gradient descent without a gradient

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Near-optimal online auctions

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Prediction, Learning, and Games

Prediction, Learning, and Games
Regret Minimization Under Partial Monitoring

Mathematics of Operations Research
Multi-armed bandits in metric spaces

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Strategies for Prediction Under Imperfect Monitoring

Mathematics of Operations Research
Apple tasting

Information and Computation
Toward a classification of finite partial-monitoring games

ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Minimizing regret with label efficient prediction

IEEE Transactions on Information Theory

Quantified Score

Hi-index	5.23

Visualization

Abstract

Partial-monitoring games constitute a mathematical framework for sequential decision making problems with imperfect feedback: the learner repeatedly chooses an action, the opponent responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his total cumulative loss. We make progress toward the classification of these games based on their minimax expected regret. Namely, we classify almost all games with two outcomes and a finite number of actions: we show that their minimax expected regret is either zero, @Q@?(T), @Q(T^2^/^3), or @Q(T), and we give a simple and efficiently computable classification of these four classes of games. Our hope is that the result can serve as a stepping stone toward classifying all finite partial-monitoring games.