Multi-agent, reward shaping for RoboCup KeepAway

Authors:
Sam Devlin;Marek Grześ;Daniel Kudenko
Affiliations:
University of York, UK;University of Waterloo, CA;University of York, UK
Venue:
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Year:
2011

Citing 6
Cited 2

Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Automatic shaping and decomposition of reward functions

Proceedings of the 24th international conference on Machine learning
A new perspective to the keepaway soccer: the takers

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Social reward shaping in the prisoner's dilemma

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
A Study of Reinforcement Learning in a New Multiagent Domain

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Theoretical considerations of potential-based reward shaping for multi-agent systems

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1

Theoretical considerations of potential-based reward shaping for multi-agent systems

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Learning potential functions and their representations for multi-task reinforcement learning

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory [2], potential-based reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of state-based and state-action-based reward shaping in RoboCup KeepAway. The results illustrate that reward shaping can alter both the learning time required to reach a stable joint policy and the final group performance for better or worse.