Reward shaping for valuing communications during multi-agent coordination

Authors:
Simon A. Williamson;Enrico H. Gerding;Nicholas R. Jennings
Affiliations:
University of Southampton, Southampton, UK;University of Southampton, Southampton, UK;University of Southampton, Southampton, UK
Venue:
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Year:
2009

Citing 11
Cited 7

The complexity of Markov decision processes

Mathematics of Operations Research
Multi-agent policies: from centralized ones to decentralized ones

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 3
Towards Flexible Teamwork in Persistent Teams: Extended Report

Autonomous Agents and Multi-Agent Systems
Rational Communication in Multi-Agent Environments

Autonomous Agents and Multi-Agent Systems
The Complexity of Decentralized Control of Markov Decision Processes

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Optimizing information exchange in cooperative multi-agent systems

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
RoboCup Rescue: A Grand Challenge for Multi-Agent Systems

ICMAS '00 Proceedings of the Fourth International Conference on MultiAgent Systems (ICMAS-2000)
Reasoning about joint beliefs for execution-time communication decisions

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
An online POMDP algorithm for complex multiagent environments

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Value-based observation compression for DEC-POMDPs

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Taming decentralized POMDPs: towards efficient policy computation for multiagent settings

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Valuing search and communication in partially-observable coordination problems

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Online planning for multi-agent systems with bounded communication

Artificial Intelligence
Teamwork in distributed POMDPs: execution-time coordination under model uncertainty

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Decentralised channel allocation and information sharing for teams of cooperative agents

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Towards a deeper understanding of cooperative equilibrium: characterization and complexity

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
CLEAN rewards for improving multiagent coordination in the presence of exploration

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Incremental clustering and expansion for faster optimal planning in decentralized POMDPs

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decentralised coordination in multi-agent systems is typically achieved using communication. However, in many cases, communication is expensive to utilise because there is limited bandwidth, it may be dangerous to communicate, or communication may simply be unavailable at times. In this context, we argue for a rational approach to communication --- if it has a cost, the agents should be able to calculate a value of communicating. By doing this, the agents can balance the need to communicate with the cost of doing so. In this research, we present a novel model of rational communication, that uses reward shaping to value communications, and employ this valuation in decentralised POMDP policy generation. In this context, reward shaping is the process by which expectations over joint actions are adjusted based on how coordinated the agent team is. An empirical evaluation of the benefits of this approach is presented in two domains. First, in the context of an idealised bench-mark problem, the multiagent Tiger problem, our method is shown to require significantly less communication (up to 30% fewer messages) and still achieves a 30% performance improvement over the current state of the art. Second, in the context of a larger-scale problem, RoboCupRescue, our method is shown to scale well, and operate without recourse to significant amounts of domain knowledge.