Hessian matrix distribution for Bayesian policy gradient reinforcement learning

Authors:
Ngo Anh Vien;Hwanjo Yu;TaeChoong Chung
Affiliations:
Artificial Intelligence Laboratory, Department of Computer Engineering, School of Electronics and Information, Kyung Hee University, 1 Seocheon, Giheung, Yongin, Gyeonggi 446-701, South Korea;Data Mining Laboratory, Department of Computer Science and Engineering, Pohang University of Science and Technology (POSTECH), South Korea;Artificial Intelligence Laboratory, Department of Computer Engineering, School of Electronics and Information, Kyung Hee University, 1 Seocheon, Giheung, Yongin, Gyeonggi 446-701, South Korea
Venue:
Information Sciences: an International Journal
Year:
2011

Citing 17
Cited 5

Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Fuzzy control rules extraction from perception-based information using computing with words

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Intelligent information systems and applications
A generic architecture for adaptive agents based on reinforcement learning

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Bio-inspired systems (BIS)
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
Reinforcement learning with Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Bayesian actor-critic algorithms

Proceedings of the 24th international conference on Machine learning
A fuzzy Actor-Critic reinforcement learning network

Information Sciences: an International Journal
Adaptive evolutionary programming based on reinforcement learning

Information Sciences: an International Journal
Online kernel selection for Bayesian reinforcement learning

Proceedings of the 25th international conference on Machine learning
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
Covariant policy search

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Natural actor-critic algorithms

Automatica (Journal of IFAC)
Natural actor-critic

ECML'05 Proceedings of the 16th European conference on Machine Learning
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning
Adaptive stock trading with dynamic asset allocation using reinforcement learning

Information Sciences: an International Journal

Efficient visual tracking using particle filter with incremental likelihood calculation

Information Sciences: an International Journal
Induced states in a decision tree constructed by Q-learning

Information Sciences: an International Journal
Simultaneous policy update algorithms for learning the solution of linear continuous-time H∞ state feedback control

Information Sciences: an International Journal
Monte-Carlo tree search for Bayesian reinforcement learning

Applied Intelligence
Learning via human feedback in continuous state and action spaces

Applied Intelligence

Quantified Score

Hi-index	0.07

Visualization

Abstract

Bayesian policy gradient algorithms have been recently proposed for modeling the policy gradient of the performance measure in reinforcement learning as a Gaussian process. These methods were known to reduce the variance and the number of samples needed to obtain accurate gradient estimates in comparison to the conventional Monte-Carlo policy gradient algorithms. In this paper, we propose an improvement over previous Bayesian frameworks for the policy gradient. We use the Hessian matrix distribution as a learning rate schedule to improve the performance of the Bayesian policy gradient algorithm in terms of the variance and the number of samples. As in computing the policy gradient distributions, the Bayesian quadrature method is used to estimate the Hessian matrix distributions. We prove that the posterior mean of the Hessian distribution estimate is symmetric, one of the important properties of the Hessian matrix. Moreover, we prove that with an appropriate choice of kernel, the computational complexity of Hessian distribution estimate is equal to that of the policy gradient distribution estimates. Using simulations, we show encouraging experimental results comparing the proposed algorithm to the Bayesian policy gradient and the Bayesian policy natural gradient algorithms described in Ghavamzadeh and Engel [10].