Practical Issues in Temporal Difference Learning
Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play
Neural Computation
Temporal difference learning and TD-Gammon
Communications of the ACM
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning to Play Chess Using Temporal Differences
Machine Learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Near-Optimal Reinforcement Learning in Polynomial Time
Machine Learning
A Bayesian Framework for Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach
Optimal learning: computational procedures for bayes-adaptive markov decision processes
Optimal learning: computational procedures for bayes-adaptive markov decision processes
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Reinforcement learning with Gaussian processes
ICML '05 Proceedings of the 22nd international conference on Machine learning
Bayesian sparse sampling for on-line reward optimization
ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Combining online and offline knowledge in UCT
Proceedings of the 24th international conference on Machine learning
Bayesian actor-critic algorithms
Proceedings of the 24th international conference on Machine learning
An analysis of model-based Interval Estimation for Markov Decision Processes
Journal of Computer and System Sciences
Near-Bayesian exploration in polynomial time
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Using linear programming for Bayesian exploration in Markov decision processes
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A reinforcement learning approach to job-shop scheduling
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Some studies in machine learning using the game of checkers
IBM Journal of Research and Development
A Bayesian sampling approach to exploration in reinforcement learning
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Hessian matrix distribution for Bayesian policy gradient reinforcement learning
Information Sciences: an International Journal
Bandit based monte-carlo planning
ECML'06 Proceedings of the 17th European conference on Machine Learning
Multi-criteria expertness based cooperative Q-learning
Applied Intelligence
Point-based online value iteration algorithm in large POMDP
Applied Intelligence
Hi-index | 0.00 |
Bayesian model-based reinforcement learning can be formulated as a partially observable Markov decision process (POMDP) to provide a principled framework for optimally balancing exploitation and exploration. Then, a POMDP solver can be used to solve the problem. If the prior distribution over the environment's dynamics is a product of Dirichlet distributions, the POMDP's optimal value function can be represented using a set of multivariate polynomials. Unfortunately, the size of the polynomials grows exponentially with the problem horizon. In this paper, we examine the use of an online Monte-Carlo tree search (MCTS) algorithm for large POMDPs, to solve the Bayesian reinforcement learning problem online. We will show that such an algorithm successfully searches for a near-optimal policy. In addition, we examine the use of a parameter tying method to keep the model search space small, and propose the use of nested mixture of tied models to increase robustness of the method when our prior information does not allow us to specify the structure of tied models exactly. Experiments show that the proposed methods substantially improve scalability of current Bayesian reinforcement learning methods.