Potential-based shaping and Q-value initialization are equivalent

Authors:
Eric Wiewiora
Affiliations:
Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA
Venue:
Journal of Artificial Intelligence Research
Year:
2003

Citing 4
Cited 15

The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning

Autonomous shaping: knowledge transfer in reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Automatic shaping and decomposition of reward functions

Proceedings of the 24th international conference on Machine learning
Social reward shaping in the prisoner's dilemma

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Co-evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Potential-based shaping in model-based reinforcement learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
2010 Special Issue: Online learning of shaping rewards in reinforcement learning

Neural Networks
Multi-task evolutionary shaping without pre-specified representations

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Darwinian embodied evolution of the learning ability for survival

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Theoretical considerations of potential-based reward shaping for multi-agent systems

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Integrating reinforcement learning with human demonstrations of varying ability

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Policy invariance under reward transformations for general-sum stochastic games

Journal of Artificial Intelligence Research
Reward function and initial values: better choices for accelerated goal-directed reinforcement learning

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Dynamic potential-based reward shaping

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Transfer in reinforcement learning via shared features

The Journal of Machine Learning Research
Learning potential functions and their representations for multi-task reinforcement learning

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Shaping has proven to be a powerful but precarious means of improving reinforcement learning performance. Ng, Harada, and Russell (1999) proposed the potential-based shaping algorithm for adding shaping rewards in a way that guarantees the learner will learn optimal behavior. In this note, we prove certain similarities between this shaping algorithm and the initialization step required for several reinforcement learning algorithms. More specifically, we prove that a reinforcement learner with initial Q-values based on the shaping algorithm's potential function make the same updates throughout learning as a learner receiving potential-based shaping rewards. We further prove that under a broad category of policies, the behavior of these two learners are indistinguishable. The comparison provides intuition on the theoretical properties of the shaping algorithm as well as a suggestion for a simpler method for capturing the algorithm's benefit. In addition, the equivalence raises previously unaddressed issues concerning the efficiency of learning with potential-based shaping.