Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

  • Authors:
  • Ronald Ortner

  • Affiliations:
  • University of Leoben, Leoben, Austria A-8700

  • Venue:
  • ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the onlineregret (with respect to an (茂戮驴-)optimal policy) that are logarithmic in the number of steps taken. These bounds also match known asymptoticbounds for the general MDP setting. We also present corresponding lower bounds. As an application, multi-armed bandits with switching cost are considered.