A Necessary and Sufficient Condition for Approachability
Mathematics of Operations Research
The empirical Bayes envelope and regret minimization in competitive Markov decision processes
Mathematics of Operations Research
A Geometric Approach to Multi-Criterion Reinforcement Learning
The Journal of Machine Learning Research
Prediction, Learning, and Games
Prediction, Learning, and Games
Online learning with expert advice and finite-horizon constraints
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Trading regret for efficiency: online convex optimization with long term constraints
The Journal of Machine Learning Research
Hi-index | 0.00 |
We study online learning where the objective of the decision maker is to maximize her average long-term reward given that some average constraints are satisfied along the sample path. We define the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature’s choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint the convex hull turns out to be the highest attainable function. We further provide an explicit strategy that attains this convex hull using a calibrated forecasting rule.