CLEAN rewards for improving multiagent coordination in the presence of exploration

  • Authors:
  • Chris HolmesParker;Adrian Agogino;Kagan Tumer

  • Affiliations:
  • Oregon State University, Corvallis, Oregon, USA;NASA Ames Research Center, Moffett Field, Oregon, USA;Oregon State University, Corvallis, Oregon, USA

  • Venue:
  • Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In cooperative multiagent systems, coordinating the joint-actions of agents is difficult. One of the fundamental difficulties in such multiagent systems is the slow learning process where an agent may not only need to learn how to behave in a complex environment, but may also need to account for the actions of the other learning agents. Here, the inability of agents to distinguish the true environmental dynamics from those caused by the stochastic exploratory actions of other agents creates noise on each agent's reward signal. To address this, we introduce Coordinated Learning without Exploratory Action Noise (CLEAN) rewards, which are agent-specific shaped rewards that effectively remove such learning noise from each agent's reward signal. We demonstrate their performance with up to 1000 agents in a standard congestion problem.