EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Linear Bayesian reinforcement learning
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Research in reinforcement learning has produced algo-rithms for optimal decision making under uncertainty thatfall within two main types. The first employs a Bayesianframework, where optimality improves with increased com-putational time. This is because the resulting planning tasktakes the form of a dynamic programming problem on a be-lief tree with an infinite number of states. The second typeemploys relatively simple algorithm which are shown to suf-fer small regret within a distribution-free framework. Thispaper presents a lower bound and a high probability up-per bound on the optimal value function for the nodes in theBayesian belief tree, which are analogous to similar boundsin POMDPs. The bounds are then used to create more ef-ficient strategies for exploring the tree. The resulting al-gorithms are compared with the distribution-free algorithmUCB1, as well as a simpler baseline algorithm on multi-armed bandit problems.