Reinforcement learning and the Bayesian control rule

Authors:
Pedro Alejandro Ortega;Daniel Alexander Braun;Simon Godsill
Affiliations:
Department of Engineering, University of Cambridge, Cambridge, UK;Department of Engineering, University of Cambridge, Cambridge, UK;Department of Engineering, University of Cambridge, Cambridge, UK
Venue:
AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Year:
2011

Citing 5
Cited 0

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Bayesian actor-critic algorithms

Proceedings of the 24th international conference on Machine learning
A minimum relative entropy principle for learning and acting

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an actor-critic scheme for reinforcement learning in complex domains. The main contribution is to show that planning and I/O dynamics can be separated such that an intractable planning problem reduces to a simple multi-armed bandit problem, where each lever stands for a potentially arbitrarily complex policy. Furthermore, we use the Bayesian control rule to construct an adaptive bandit player that is universal with respect to a given class of optimal bandit players, thus indirectly constructing an adaptive agent that is universal with respect to a given class of policies.