Theoretical considerations of potential-based reward shaping for multi-agent systems

Authors:
Sam Devlin;Daniel Kudenko
Affiliations:
University of York, UK;University of York, UK
Venue:
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Year:
2011

Citing 24
Cited 5

Competitive Markov decision processes

Competitive Markov decision processes
The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Team-partitioned, opaque-transition reinforcement learning

Proceedings of the third annual conference on Autonomous Agents
Conjectural Equilibrium in Multiagent Learning

Machine Learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Introduction to Multiagent Systems

Introduction to Multiagent Systems
Reinforcement Learning in the Multi-Robot Domain

Autonomous Robots
Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Reinforcement learning of coordination in cooperative multi-agent systems

Eighteenth national conference on Artificial intelligence
Temporal credit assignment in reinforcement learning

Temporal credit assignment in reinforcement learning
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
If multi-agent learning is the answer, what is the question?

Artificial Intelligence
Automatic shaping and decomposition of reward functions

Proceedings of the 24th international conference on Machine learning
Dynamic Programming and Optimal Control, Vol. II

Dynamic Programming and Optimal Control, Vol. II
Social reward shaping in the prisoner's dilemma

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Potential-based shaping in model-based reinforcement learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Potential-based shaping and Q-value initialization are equivalent

Journal of Artificial Intelligence Research
Sequential optimality and coordination in multiagent systems

IJCAI'99 Proceedings of the 16th international joint conference on Artifical intelligence - Volume 1
Multi-agent, reward shaping for RoboCup KeepAway

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Decentralized learning in wireless sensor networks

ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents
A Comprehensive Survey of Multiagent Reinforcement Learning

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

Multi-agent, reward shaping for RoboCup KeepAway

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
On the power of global reward signals in reinforcement learning

MATES'11 Proceedings of the 9th German conference on Multiagent system technologies
Policy invariance under reward transformations for general-sum stochastic games

Journal of Artificial Intelligence Research
Dynamic potential-based reward shaping

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Learning potential functions and their representations for multi-task reinforcement learning

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Potential-based reward shaping has previously been proven to both be equivalent to Q-table initialisation and guarantee policy invariance in single-agent reinforcement learning. The method has since been used in multi-agent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multi-agent systems, providing the theoretical background to explain the success of previous empirical studies. Specifically, it is proven that the equivalence to Q-table initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified. Furthermore, we demonstrate empirically that potential-based reward shaping affects exploration and, consequentially, can alter the joint policy converged upon.