Monte-Carlo tree search for Bayesian reinforcement learning

Authors:
Ngo Anh Vien;Wolfgang Ertel;Viet-Hung Dang;Taechoong Chung
Affiliations:
Institute of Artificial Intelligence, Ravensburg-Weingarten University of Applied Sciences, Weingarten, Germany 88250;Institute of Artificial Intelligence, Ravensburg-Weingarten University of Applied Sciences, Weingarten, Germany 88250;Research and Development Center for Science and Technology, DuyTan University, Da Nang, Vietnam;Department of Computer Engineering, Kyung Hee University, Seoul, South Korea
Venue:
Applied Intelligence
Year:
2013

Citing 30
Cited 1

Practical Issues in Temporal Difference Learning

Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Temporal difference learning and TD-Gammon

Communications of the ACM
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning to Play Chess Using Temporal Differences

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
A Bayesian Framework for Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Optimal learning: computational procedures for bayes-adaptive markov decision processes

Optimal learning: computational procedures for bayes-adaptive markov decision processes
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Distributed Reinforcement Learning Control for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems

Applied Intelligence
Reinforcement learning with Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Bayesian sparse sampling for on-line reward optimization

ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
Bayesian actor-critic algorithms

Proceedings of the 24th international conference on Machine learning
An analysis of model-based Interval Estimation for Markov Decision Processes

Journal of Computer and System Sciences
Near-Bayesian exploration in polynomial time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning teaching strategies in an Adaptive and Intelligent Educational System through Reinforcement Learning

Applied Intelligence
Using linear programming for Bayesian exploration in Markov decision processes

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A reinforcement learning approach to job-shop scheduling

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Some studies in machine learning using the game of checkers

IBM Journal of Research and Development
A Bayesian sampling approach to exploration in reinforcement learning

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Hessian matrix distribution for Bayesian policy gradient reinforcement learning

Information Sciences: an International Journal
Microassembly path planning using reinforcement learning for improving positioning accuracy of a 1 cm3 omni-directional mobile microrobot

Applied Intelligence
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning
Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game

Applied Intelligence
Multi-criteria expertness based cooperative Q-learning

Applied Intelligence

Point-based online value iteration algorithm in large POMDP

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bayesian model-based reinforcement learning can be formulated as a partially observable Markov decision process (POMDP) to provide a principled framework for optimally balancing exploitation and exploration. Then, a POMDP solver can be used to solve the problem. If the prior distribution over the environment's dynamics is a product of Dirichlet distributions, the POMDP's optimal value function can be represented using a set of multivariate polynomials. Unfortunately, the size of the polynomials grows exponentially with the problem horizon. In this paper, we examine the use of an online Monte-Carlo tree search (MCTS) algorithm for large POMDPs, to solve the Bayesian reinforcement learning problem online. We will show that such an algorithm successfully searches for a near-optimal policy. In addition, we examine the use of a parameter tying method to keep the model search space small, and propose the use of nested mixture of tied models to increase robustness of the method when our prior information does not allow us to specify the structure of tied models exactly. Experiments show that the proposed methods substantially improve scalability of current Bayesian reinforcement learning methods.