Tree Exploration for Bayesian RL Exploration

Authors:
Christos Dimitrakakis
Affiliations:
-
Venue:
CIMCA '08 Proceedings of the 2008 International Conference on Computational Intelligence for Modelling Control & Automation
Year:
2008

Citing 0
Cited 2

Active learning of MDP models

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Linear Bayesian reinforcement learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Research in reinforcement learning has produced algo-rithms for optimal decision making under uncertainty thatfall within two main types. The first employs a Bayesianframework, where optimality improves with increased com-putational time. This is because the resulting planning tasktakes the form of a dynamic programming problem on a be-lief tree with an infinite number of states. The second typeemploys relatively simple algorithm which are shown to suf-fer small regret within a distribution-free framework. Thispaper presents a lower bound and a high probability up-per bound on the optimal value function for the nodes in theBayesian belief tree, which are analogous to similar boundsin POMDPs. The bounds are then used to create more ef-ficient strategies for exploring the tree. The resulting al-gorithms are compared with the distribution-free algorithmUCB1, as well as a simpler baseline algorithm on multi-armed bandit problems.