Reinforcement learning and the Bayesian control rule

  • Authors:
  • Pedro Alejandro Ortega;Daniel Alexander Braun;Simon Godsill

  • Affiliations:
  • Department of Engineering, University of Cambridge, Cambridge, UK;Department of Engineering, University of Cambridge, Cambridge, UK;Department of Engineering, University of Cambridge, Cambridge, UK

  • Venue:
  • AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an actor-critic scheme for reinforcement learning in complex domains. The main contribution is to show that planning and I/O dynamics can be separated such that an intractable planning problem reduces to a simple multi-armed bandit problem, where each lever stands for a potentially arbitrarily complex policy. Furthermore, we use the Bayesian control rule to construct an adaptive bandit player that is universal with respect to a given class of optimal bandit players, thus indirectly constructing an adaptive agent that is universal with respect to a given class of policies.