Solving sparse delayed coordination problems in multi-agent reinforcement learning

  • Authors:
  • Yann-Michaël De Hauwere;Peter Vrancx;Ann Nowé

  • Affiliations:
  • Computational Modeling Lab, Vrije Universiteit Brussel, Brussels, Belgium;Computational Modeling Lab, Vrije Universiteit Brussel, Brussels, Belgium;Computational Modeling Lab, Vrije Universiteit Brussel, Brussels, Belgium

  • Venue:
  • ALA'11 Proceedings of the 11th international conference on Adaptive and Learning Agents
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the main advantages of Reinforcement Learning is the capability of dealing with a delayed reward signal. Using an appropriate backup diagram, rewards are backpropagated through the state space. This allows agents to learn to take the correct action that results in the highest future (discounted) reward, even if that action results in a suboptimal immediate reward in the current state. In a multi-agent environment, agents can use the same principles as in single agent RL, but have to apply them in a complete joint-state-joint-action space to guarantee optimality. Learning in such a state space can however be very slow. In this paper we present our approach for mitigating this problem. Future Coordinating Q-learning (FCQ-learning) detects strategic interactions between agents several timesteps before these interactions occur. FCQ-learning uses the same principles as CQ-learning [3] to detect the states in which interaction is required, but several timesteps before this is reflected in the reward signal. In these states, the algorithm will augment the state information to include information about other agents which is used to select actions. The techniques presented in this paper are the first to explicitly deal with a delayed reward signal when learning using sparse interactions.