Reinforcement learning with hierarchies of machines
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Variable Resolution Discretization in Optimal Control
Machine Learning
ECML '00 Proceedings of the 11th European Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Advances in Neural Information Processing Systems 5, [NIPS Conference]
Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Behavior transfer for value-function-based reinforcement learning
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Prioritization Methods for Accelerating MDP Solvers
The Journal of Machine Learning Research
Qualitative reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Automatic shaping and decomposition of reward functions
Proceedings of the 24th international conference on Machine learning
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains how to compute the potential which is used to shape the reward that is given to the learning agent. In this paper we propose a way to solve this problem in reinforcement learning with state space discretisation. In particular, we show that the potential function can be learned online in parallel with the actual reinforcement learning process. If the Q-function is learned for states determined by a given grid, a V-function for states with lower resolution can be learned in parallel and used to approximate the potential for ground learning. The novel algorithm is presented and experimentally evaluated.