Generalized multiagent learning with performance bound

Authors:
Bikramjit Banerjee;Jing Peng
Affiliations:
Department of Electrical Engineering & Computer Science, Tulane University, New Orleans, USA 70118;Department of Electrical Engineering & Computer Science, Tulane University, New Orleans, USA 70118
Venue:
Autonomous Agents and Multi-Agent Systems
Year:
2007

Citing 17
Cited 4

Elements of information theory

Elements of information theory
Learning to coordinate without sharing information

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
The weighted majority algorithm

Information and Computation
The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Multiagent learning using a variable learning rate

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
On No-Regret Learning, Fictitious Play, and Nash Equilibrium

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
On Multiagent Q-Learning in a Semi-Competitive Domain

IJCAI '95 Proceedings of the Workshop on Adaption and Learning in Multi-Agent Systems
Nash Convergence of Gradient Dynamics in General-Sum Games

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Gambling in a rigged casino: The adversarial multi-armed bandit problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
Best-Response Multiagent Learning in Non-Stationary Environments

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Online convex optimization in the bandit setting: gradient descent without a gradient

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Performance bounded reinforcement learning in strategic interactions

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Rational and convergent learning in stochastic games

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

A multiagent reinforcement learning algorithm with non-linear dynamics

Journal of Artificial Intelligence Research
Effective learning in the presence of adaptive counterparts

Journal of Algorithms
Using graph analysis to study networks of adaptive agent

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Multiagent meta-level control for radar coordination

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present new Multiagent learning (MAL) algorithms with the general philosophy of policy convergence against some classes of opponents but otherwise ensuring high payoffs. We consider a 3-class breakdown of opponent types: (eventually) stationary, self-play and "other" (see Definition 4) agents. We start with ReDVaLeR that can satisfy policy convergence against the first two types and no-regret against the third, but it needs to know the type of the opponents. This serves as a baseline to delineate the difficulty of achieving these goals. We show that a simple modification on ReDVaLeR yields a new algorithm, RV 驴(t), that achieves no-regret payoffs in all games, and convergence to Nash equilibria in self-play (and to best response against eventually stationary opponents--a corollary of no-regret) simultaneously, without knowing the opponent types, but in a smaller class of games than ReDVaLeR . RV 驴(t) effectively ensures the performance of a learner during the process of learning, as opposed to the performance of a learned behavior. We show that the expression for regret of RV 驴(t) can have a slightly better form than those of other comparable algorithms like GIGA and GIGA-WoLF though, contrastingly, our analysis is in continuous time. Moreover, experiments show that RV 驴(t) can converge to an equilibrium in some cases where GIGA, GIGA-WoLF would fail, and to better equilibria where GIGA, GIGA-WoLF converge to undesirable equilibria (coordination games). This important class of coordination games also highlights the key desirability of policy convergence as a criterion for MAL in self-play instead of high average payoffs. To our knowledge, this is also the first successful (guaranteed) attempt at policy convergence of a no-regret algorithm in the Shapley game.