Online learning with constraints

  • Authors:
  • Shie Mannor;John N. Tsitsiklis

  • Affiliations:
  • Department of Electrical and Computer Engingeering, McGill University, Québec;Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA

  • Venue:
  • COLT'06 Proceedings of the 19th annual conference on Learning Theory
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study online learning where the objective of the decision maker is to maximize her average long-term reward given that some average constraints are satisfied along the sample path. We define the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature’s choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint the convex hull turns out to be the highest attainable function. We further provide an explicit strategy that attains this convex hull using a calibrated forecasting rule.