The weighted majority algorithm
Information and Computation
Multiagent learning using a variable learning rate
Artificial Intelligence
Friend-or-Foe Q-learning in General-Sum Games
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
On No-Regret Learning, Fictitious Play, and Nash Equilibrium
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Gambling in a rigged casino: The adversarial multi-armed bandit problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Nash q-learning for general-sum stochastic games
The Journal of Machine Learning Research
Best-Response Multiagent Learning in Non-Stationary Environments
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Performance bounded reinforcement learning in strategic interactions
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Rational and convergent learning in stochastic games
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Nash convergence of gradient dynamics in general-sum games
UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
We present a new multiagent learning algorithm, RVσ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-play and otherwise non-stationary agents and (2) that all agents know their portions of the same equilibrium in self-play. We show that the adaptive learning rate of RVσ(t) that is explicitly dependent on time can overcome both of these assumptions. Consequently, RVσ(t) theoretically achieves (a') convergence to near-best response against eventually stationary opponents, (b') no-regret payoff against arbitrary opponents and (c') convergence to some Nash equilibrium policy in some classes of games, in self-play. Each agent now needs to know its portion of any equilibrium, and does not need to distinguish among non-stationary opponent types. This is also the first successful attempt (to our knowledge) at convergence of a no-regret algorithm in the Shapley game.