2010 Special Issue: Online learning of shaping rewards in reinforcement learning

Authors:
Marek Grze;Daniel Kudenko
Affiliations:
Department of Computer Science, University of York, York, YO10 5DD, UK;Department of Computer Science, University of York, York, YO10 5DD, UK
Venue:
Neural Networks
Year:
2010

Citing 22
Cited 7

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Empirical methods for artificial intelligence

Empirical methods for artificial intelligence
Reinforcement learning with hierarchies of machines

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
PEGASUS: A policy search method for large MDPs and POMDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Automated Planning: Theory & Practice

Automated Planning: Theory & Practice
Behavior transfer for value-function-based reinforcement learning

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Prioritization Methods for Accelerating MDP Solvers

The Journal of Machine Learning Research
Qualitative reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Autonomous shaping: knowledge transfer in reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Automatic shaping and decomposition of reward functions

Proceedings of the 24th international conference on Machine learning
Potential-based shaping in model-based reinforcement learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Potential-based shaping and Q-value initialization are equivalent

Journal of Artificial Intelligence Research
Real-time heuristic search with a priority queue

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Learning form experience: a bayesian network based reinforcement learning approach

ICICA'11 Proceedings of the Second international conference on Information Computing and Applications
Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems

Knowledge-Based Systems
Dynamic potential-based reward shaping

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Adaptive exploration using stochastic neurons

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Gradient algorithms for exploration/exploitation trade-offs: global and local variants

ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition
Transferring task models in Reinforcement Learning agents

Neurocomputing
Learning potential functions and their representations for multi-task reinforcement learning

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains of how to compute the potential function which is used to shape the reward that is given to the learning agent. In this paper, we show how, in the absence of knowledge to define the potential function manually, this function can be learned online in parallel with the actual reinforcement learning process. Two cases are considered. The first solution which is based on the multi-grid discretisation is designed for model-free reinforcement learning. In the second case, the approach for the prototypical model-based R-max algorithm is proposed. It learns the potential function using the free space assumption about the transitions in the environment. Two novel algorithms are presented and evaluated.