A multiagent reinforcement learning algorithm with non-linear dynamics

Authors:
Sherief Abdallah;Victor Lesser
Affiliations:
Faculty of Informatics, The British University in Dubai, United Arab Emirates, and School of Informatics, University of Edinburgh, United Kingdom;Department of Computer Science, University of Massachusetts Amherst, United States
Venue:
Journal of Artificial Intelligence Research
Year:
2008

Citing 14
Cited 8

The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Multiagent learning using a variable learning rate

Artificial Intelligence
Learning to Cooperate via Policy Search

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Nash Convergence of Gradient Dynamics in General-Sum Games

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Adaptive policy gradient in multiagent learning

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games

Autonomous Agents and Multi-Agent Systems
Learning the task allocation game

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents

Machine Learning
Generalized multiagent learning with performance bound

Autonomous Agents and Multi-Agent Systems
Multiagent reinforcement learning and self-organization in a network of agents

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Non-linear dynamics in multiagent reinforcement learning algorithms

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Cooperative information sharing to improve distributed learning in multi-agent systems

Journal of Artificial Intelligence Research
Value-function reinforcement learning in Markov games

Cognitive Systems Research

Tentative Exploration on Reinforcement Learning Algorithms for Stochastic Rewards

HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
Using graph analysis to study networks of adaptive agent

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Learning hybridization strategies in evolutionary algorithms

Intelligent Data Analysis
A common gradient in multi-agent reinforcement learning

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Continuous strategy replicator dynamics for multi-agent Q-learning

Autonomous Agents and Multi-Agent Systems
Multi-agent learning and the reinforcement gradient

EUMAS'11 Proceedings of the 9th European conference on Multi-Agent Systems
Addressing the policy-bias of q-learning by repeating updates

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
An actor-critic algorithm for multi-agent learning in queue-based stochastic games

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several multiagent reinforcement learning (MARL) algorithms have been proposed to optimize agents' decisions. Due to the complexity of the problem, the majority of the previously developed MARL algorithms assumed agents either had some knowledge of the underlying game (such as Nash equilibria) and/or observed other agents actions and the rewards they received. We introduce a new MARL algorithm called theWeighted Policy Learner (WPL), which allows agents to reach a Nash Equilibrium (NE) in benchmark 2-player-2-action games with minimum knowledge. Using WPL, the only feedback an agent needs is its own local reward (the agent does not observe other agents actions or rewards). Furthermore, WPL does not assume that agents know the underlying game or the corresponding Nash Equilibrium a priori. We experimentally show that our algorithm converges in benchmark two-player-two-action games. We also show that our algorithm converges in the challenging Shapley's game where previous MARL algorithms failed to converge without knowing the underlying game or the NE. Furthermore, we show that WPL outperforms the state-of-the-art algorithms in a more realistic setting of 100 agents interacting and learning concurrently. An important aspect of understanding the behavior of a MARL algorithm is analyzing the dynamics of the algorithm: how the policies of multiple learning agents evolve over time as agents interact with one another. Such an analysis not only verifies whether agents using a given MARL algorithm will eventually converge, but also reveals the behavior of the MARL algorithm prior to convergence. We analyze our algorithm in two-player-two-action games and show that symbolically proving WPL's convergence is difficult, because of the non-linear nature of WPL's dynamics, unlike previous MARL algorithms that had either linear or piece-wise-linear dynamics. Instead, we numerically solve WPL's dynamics differential equations and compare the solution to the dynamics of previous MARL algorithms.