Tree Exploration for Bayesian RL Exploration

  • Authors:
  • Christos Dimitrakakis

  • Affiliations:
  • -

  • Venue:
  • CIMCA '08 Proceedings of the 2008 International Conference on Computational Intelligence for Modelling Control & Automation
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Research in reinforcement learning has produced algo-rithms for optimal decision making under uncertainty thatfall within two main types. The first employs a Bayesianframework, where optimality improves with increased com-putational time. This is because the resulting planning tasktakes the form of a dynamic programming problem on a be-lief tree with an infinite number of states. The second typeemploys relatively simple algorithm which are shown to suf-fer small regret within a distribution-free framework. Thispaper presents a lower bound and a high probability up-per bound on the optimal value function for the nodes in theBayesian belief tree, which are analogous to similar boundsin POMDPs. The bounds are then used to create more ef-ficient strategies for exploring the tree. The resulting al-gorithms are compared with the distribution-free algorithmUCB1, as well as a simpler baseline algorithm on multi-armed bandit problems.