Multi-agent learning and the reinforcement gradient

Authors:
Michael Kaisers;Daan Bloembergen;Karl Tuyls
Affiliations:
Department of Knowledge Engineering, Maastricht University, The Netherlands;Department of Knowledge Engineering, Maastricht University, The Netherlands;Department of Knowledge Engineering, Maastricht University, The Netherlands
Venue:
EUMAS'11 Proceedings of the 9th European conference on Multi-Agent Systems
Year:
2011

Citing 10
Cited 0

Technical Note: \cal Q-Learning

Machine Learning
Multiagent learning using a variable learning rate

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games

Autonomous Agents and Multi-Agent Systems
A multiagent reinforcement learning algorithm with non-linear dynamics

Journal of Artificial Intelligence Research
Frequency adjusted multi-agent Q-learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Evolutionary dynamics of regret minimization

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Nash convergence of gradient dynamics in general-sum games

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
A Comprehensive Survey of Multiagent Reinforcement Learning

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Varieties of learning automata: an overview

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article shows that seemingly diverse implementations of multi-agent reinforcement learning share the same basic building block in their learning dynamics: a mathematical term that is closely related to the gradient of the expected reward. Gradient Ascent on the expected reward has been used to derive strong convergence results in two-player two-action games, at the expense of strong assumptions such as full information on the game that is being played. Variations of Gradient Ascent, such as Infinitesimal Gradient Ascent (IGA), Win-or-Learn-Fast IGA, and Weighted Policy Learning (WPL), assume a known value function for which the reinforcement gradient can be computed directly. In contrast, independent multi-agent reinforcement learning algorithms that assume less information on the game being played such as Cross learning, variations of Q-learning and Regret minimization base their learning on feedback from discrete interactions with the environment, requiring neither an explicit representation of the value function nor its gradient. Despite this much stricter limitation on information available to these algorithms, they yield dynamics which are very similar to Gradient Ascent and exhibit equivalent convergence behavior. In addition to the formal derivation, directional field plots of the learning dynamics in representative classes of two-player two-action games illustrate the similarities and strengthen the theoretical findings.