Unifying convergence and no-regret in multiagent learning

Authors:
Bikramjit Banerjee;Jing Peng
Affiliations:
Department of Electrical Engineering & Computer Science, Tulane University, New Orleans, LA;Department of Electrical Engineering & Computer Science, Tulane University, New Orleans, LA
Venue:
LAMAS'05 Proceedings of the First international conference on Learning and Adaption in Multi-Agent Systems
Year:
2005

Citing 10
Cited 0

The weighted majority algorithm

Information and Computation
Multiagent learning using a variable learning rate

Artificial Intelligence
Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
On No-Regret Learning, Fictitious Play, and Nash Equilibrium

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Gambling in a rigged casino: The adversarial multi-armed bandit problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
Best-Response Multiagent Learning in Non-Stationary Environments

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Performance bounded reinforcement learning in strategic interactions

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Rational and convergent learning in stochastic games

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Nash convergence of gradient dynamics in general-sum games

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new multiagent learning algorithm, RVσ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-play and otherwise non-stationary agents and (2) that all agents know their portions of the same equilibrium in self-play. We show that the adaptive learning rate of RVσ(t) that is explicitly dependent on time can overcome both of these assumptions. Consequently, RVσ(t) theoretically achieves (a') convergence to near-best response against eventually stationary opponents, (b') no-regret payoff against arbitrary opponents and (c') convergence to some Nash equilibrium policy in some classes of games, in self-play. Each agent now needs to know its portion of any equilibrium, and does not need to distinguish among non-stationary opponent types. This is also the first successful attempt (to our knowledge) at convergence of a no-regret algorithm in the Shapley game.