Multigrid Reinforcement Learning with Reward Shaping

Authors:
Marek Grześ;Daniel Kudenko
Affiliations:
Department of Computer Science, University of York, York, UK YO10 5DD;Department of Computer Science, University of York, York, UK YO10 5DD
Venue:
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Year:
2008

Citing 14
Cited 0

Reinforcement learning with hierarchies of machines

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Variable Resolution Discretization in Optimal Control

Machine Learning
Layered Learning

ECML '00 Proceedings of the 11th European Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Feudal Reinforcement Learning

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Behavior transfer for value-function-based reinforcement learning

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Prioritization Methods for Accelerating MDP Solvers

The Journal of Machine Learning Research
Qualitative reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Automatic shaping and decomposition of reward functions

Proceedings of the 24th international conference on Machine learning
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains how to compute the potential which is used to shape the reward that is given to the learning agent. In this paper we propose a way to solve this problem in reinforcement learning with state space discretisation. In particular, we show that the potential function can be learned online in parallel with the actual reinforcement learning process. If the Q-function is learned for states determined by a given grid, a V-function for states with lower resolution can be learned in parallel and used to approximate the potential for ground learning. The novel algorithm is presented and experimentally evaluated.