Multiagent learning in adaptive dynamic systems

Authors:
Andriy Burkov;Brahim Chaib-draa
Affiliations:
Laval University, Quebec, Canada;Laval University, Quebec, Canada
Venue:
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Year:
2007

Citing 5
Cited 2

The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Multiagent learning using a variable learning rate

Artificial Intelligence
Nash Convergence of Gradient Dynamics in General-Sum Games

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Run the GAMUT: A Comprehensive Approach to Evaluating Game-Theoretic Algorithms

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Learning against opponents with bounded memory

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Competition and Coordination in Stochastic Games

CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Utility based Q-learning to facilitate cooperation in Prisoner's Dilemma games

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classically, an approach to the multiagent policy learning supposed that the agents, via interactions and/or by using preliminary knowledge about the reward functions of all players, would find an interdependent solution called "equilibrium". Recently, however, certain researchers question the necessity and the validity of the concept of equilibrium as the most important multiagent solution concept. They argue that a "good" learning algorithm is one that is efficient with respect to a certain class of counterparts. Adaptive players is an important class of agents that learn their policies separately from the maintenance of the beliefs about their counterparts' future actions and make their decisions based on that policy and the current belief. In this paper, we propose an efficient learning algorithm in presence of the adaptive counterparts called Adaptive Dynamics Learner (ADL), which is able to learn an efficient policy over the opponents' adaptive dynamics rather than over the simple actions and beliefs and, by so doing, to exploit these dynamics to obtain a higher utility than any equilibrium strategy can provide. We tested our algorithm on a substantial representative set of the most known and demonstrative matrix games and observed that ADL agent is highly efficient against Adaptive Play Q-learning (APQ) agent and Infinitesimal Gradient Ascent (IGA) agent. In self-play, when possible, ADL is able to converge to a Pareto optimal strategy maximizing the welfare of all players.