Theoretical advantages of lenient Q-learners: an evolutionary game theoretic perspective

Authors:
Liviu Panait;Karl Tuyls
Affiliations:
Google Inc, Santa Monica, CA;Maastricht University, MiCC-IKAT, The Netherlands
Venue:
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Year:
2007

Citing 9
Cited 3

Technical Note: \cal Q-Learning

Machine Learning
The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Reinforcement learning of coordination in cooperative multi-agent systems

Eighteenth national conference on Artificial intelligence
A selection-mutation model for q-learning in multi-agent systems

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
An analysis of cooperative coevolutionary algorithms

An analysis of cooperative coevolutionary algorithms
An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games

Autonomous Agents and Multi-Agent Systems

Switching dynamics of multi-agent learning

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Multi-agent Learning Dynamics: A Survey

CIA '07 Proceedings of the 11th international workshop on Cooperative Information Agents XI
A proximate dynamics model for data mining

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the dynamics of multiple reinforcement learning agents from an Evolutionary Game Theoretic (EGT) perspective. We provide a Replicator Dynamics model for traditional multiagent Q-learning, and we extend these differential equations to account for lenient learners: agents that forgive possible mistakes of their teammates that resulted in lower rewards. We use this extended formal model to visualize the basins of attraction of both traditional and lenient multiagent Q-learners in two benchmark coordination problems. The results indicate that lenience provides learners with more accurate estimates for the utility of their actions, resulting in higher likelihood of convergence to the globally optimal solution. In addition, our research supports the strength of EGT as a backbone for multiagent reinforcement learning.