Prior-free exploration bonus for and beyond near bayes-optimal behavior

Authors:
Kenji Kawaguchi;Hiroshi Sato
Affiliations:
BWBP Artificial Intelligence Laboratory, Japan;National Defense Academy of Japan, Japan
Venue:
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Year:
2013

Citing 12
Cited 0

Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty

Machine Learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
A Bayesian Framework for Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Optimal learning: computational procedures for bayes-adaptive markov decision processes

Optimal learning: computational procedures for bayes-adaptive markov decision processes
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Bayesian sparse sampling for on-line reward optimization

ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
An analysis of model-based Interval Estimation for Markov Decision Processes

Journal of Computer and System Sciences
Probably approximately correct (pac) exploration in reiforcement learning

Probably approximately correct (pac) exploration in reiforcement learning
Near-Bayesian exploration in polynomial time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Non-parametric detection of meaningless distances in high dimensional data

Statistics and Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study Bayesian reinforcement learning (RL) as a solution of the exploration-exploitation dilemma. As full Bayesian planning is intractable except for special cases, previous work has proposed several approximation methods. However, these were often computationally expensive or limited to Dirichlet priors. In this paper, we propose a new algorithm that is fast and of polynomial time for near Bayesian optimal policy with any prior distributions that are not greatly misspecified. Perhaps even more interestingly, the proposed algorithm can naturally avoid being misled by incorrect beliefs, while effectively utilizing useful parts of prior information. It can work well even when an utterly misspecified prior is assigned. In that case, the algorithm will follow PAC-MDP behavior instead, if an existing PACMDP algorithm does so. The proposed algorithm naturally outperformed other algorithms compared with it on a standard benchmark problem.