Online learning with constraints

Authors:
Shie Mannor;John N. Tsitsiklis
Affiliations:
Department of Electrical and Computer Engingeering, McGill University, Québec;Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA
Venue:
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Year:
2006

Citing 4
Cited 2

A Necessary and Sufficient Condition for Approachability

Mathematics of Operations Research
The empirical Bayes envelope and regret minimization in competitive Markov decision processes

Mathematics of Operations Research
A Geometric Approach to Multi-Criterion Reinforcement Learning

The Journal of Machine Learning Research
Prediction, Learning, and Games

Prediction, Learning, and Games

Online learning with expert advice and finite-horizon constraints

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Trading regret for efficiency: online convex optimization with long term constraints

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study online learning where the objective of the decision maker is to maximize her average long-term reward given that some average constraints are satisfied along the sample path. We define the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature’s choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint the convex hull turns out to be the highest attainable function. We further provide an explicit strategy that attains this convex hull using a calibrated forecasting rule.