Dynamic analysis of multiagent Q-learning with ε-greedy exploration

Authors:
Eduardo Rodrigues Gomes;Ryszard Kowalczyk
Affiliations:
Swinburne University of Technology, Hawthorn, VIC, Australia;Swinburne University of Technology, Hawthorn, VIC, Australia
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 14
Cited 8

The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

SIAM Journal on Control and Optimization
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Predicting the Expected Behavior of Agents that Learn About Agents: The CLRI Framework

Autonomous Agents and Multi-Agent Systems
A selection-mutation model for q-learning in multi-agent systems

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Resource Allocation in the Grid Using Reinforcement Learning

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Individual Q-Learning in Normal Form Games

SIAM Journal on Control and Optimization
Cooperative Multi-Agent Learning: The State of the Art

Autonomous Agents and Multi-Agent Systems
Learning the IPA Market with Individual and Social Rewards

IAT '07 Proceedings of the 2007 IEEE/WIC/ACM International Conference on Intelligent Agent Technology
Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective

The Journal of Machine Learning Research
Non-linear dynamics in multiagent reinforcement learning algorithms

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Optimal Local Basis: A Reinforcement Learning Approach for Face Recognition

International Journal of Computer Vision
Learning teaching strategies in an Adaptive and Intelligent Educational System through Reinforcement Learning

Applied Intelligence
Predicting and preventing coordination problems in cooperative Q-learning systems

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Globally Optimal Multi-agent Reinforcement Learning Parameters in Distributed Task Assignment

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Self-organisation in an agent network via learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Self-organisation in an agent network via multiagent Q-learning

PKAW'10 Proceedings of the 11th international conference on Knowledge management and acquisition for smart systems and services
Evaluating Q-learning policies for multi-objective foraging task in a multi-agent environment

ICIRA'10 Proceedings of the Third international conference on Intelligent robotics and applications - Volume Part II
A composite self-organisation mechanism in an agent network

WISE'11 Proceedings of the 12th international conference on Web information system engineering
Adaptive multi-robot team reconfiguration using a policy-reuse reinforcement learning approach

AAMAS'11 Proceedings of the 10th international conference on Advanced Agent Technology
Enabling opportunistic and dynamic spectrum access through learning techniques

Wireless Communications & Mobile Computing
Self-organization in an agent network: A mechanism and a potential application

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The development of mechanisms to understand and model the expected behaviour of multiagent learners is becoming increasingly important as the area rapidly find application in a variety of domains. In this paper we present a framework to model the behaviour of Q-learning agents using the ε-greedy exploration mechanism. For this, we analyse a continuous-time version of the Q-learning update rule and study how the presence of other agents and the ε-greedy mechanism affect it. We then model the problem as a system of difference equations which is used to theoretically analyse the expected behaviour of the agents. The applicability of the framework is tested through experiments in typical games selected from the literature.