Solving sparse delayed coordination problems in multi-agent reinforcement learning

Authors:
Yann-Michaël De Hauwere;Peter Vrancx;Ann Nowé
Affiliations:
Computational Modeling Lab, Vrije Universiteit Brussel, Brussels, Belgium;Computational Modeling Lab, Vrije Universiteit Brussel, Brussels, Belgium;Computational Modeling Lab, Vrije Universiteit Brussel, Brussels, Belgium
Venue:
ALA'11 Proceedings of the 11th international conference on Adaptive and Learning Agents
Year:
2011

Citing 9
Cited 0

Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
Sparse cooperative Q-learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Planning, learning and coordination in multiagent decision processes

TARK '96 Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge
Learning of coordination: exploiting sparse interactions in multiagent systems

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Learning multi-agent state space representations

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Decentralized Learning in Markov Games

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the main advantages of Reinforcement Learning is the capability of dealing with a delayed reward signal. Using an appropriate backup diagram, rewards are backpropagated through the state space. This allows agents to learn to take the correct action that results in the highest future (discounted) reward, even if that action results in a suboptimal immediate reward in the current state. In a multi-agent environment, agents can use the same principles as in single agent RL, but have to apply them in a complete joint-state-joint-action space to guarantee optimality. Learning in such a state space can however be very slow. In this paper we present our approach for mitigating this problem. Future Coordinating Q-learning (FCQ-learning) detects strategic interactions between agents several timesteps before these interactions occur. FCQ-learning uses the same principles as CQ-learning [3] to detect the states in which interaction is required, but several timesteps before this is reflected in the reward signal. In these states, the algorithm will augment the state information to include information about other agents which is used to select actions. The techniques presented in this paper are the first to explicitly deal with a delayed reward signal when learning using sparse interactions.