Elements of information theory
Elements of information theory
Learning to coordinate without sharing information
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
The weighted majority algorithm
Information and Computation
The dynamics of reinforcement learning in cooperative multiagent systems
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Multiagent learning using a variable learning rate
Artificial Intelligence
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Friend-or-Foe Q-learning in General-Sum Games
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
On No-Regret Learning, Fictitious Play, and Nash Equilibrium
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
On Multiagent Q-Learning in a Semi-Competitive Domain
IJCAI '95 Proceedings of the Workshop on Adaption and Learning in Multi-Agent Systems
Nash Convergence of Gradient Dynamics in General-Sum Games
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Gambling in a rigged casino: The adversarial multi-armed bandit problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Nash q-learning for general-sum stochastic games
The Journal of Machine Learning Research
Best-Response Multiagent Learning in Non-Stationary Environments
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Online convex optimization in the bandit setting: gradient descent without a gradient
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Performance bounded reinforcement learning in strategic interactions
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Rational and convergent learning in stochastic games
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
A multiagent reinforcement learning algorithm with non-linear dynamics
Journal of Artificial Intelligence Research
Effective learning in the presence of adaptive counterparts
Journal of Algorithms
Using graph analysis to study networks of adaptive agent
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Multiagent meta-level control for radar coordination
Web Intelligence and Agent Systems
Hi-index | 0.00 |
We present new Multiagent learning (MAL) algorithms with the general philosophy of policy convergence against some classes of opponents but otherwise ensuring high payoffs. We consider a 3-class breakdown of opponent types: (eventually) stationary, self-play and "other" (see Definition 4) agents. We start with ReDVaLeR that can satisfy policy convergence against the first two types and no-regret against the third, but it needs to know the type of the opponents. This serves as a baseline to delineate the difficulty of achieving these goals. We show that a simple modification on ReDVaLeR yields a new algorithm, RV 驴(t), that achieves no-regret payoffs in all games, and convergence to Nash equilibria in self-play (and to best response against eventually stationary opponents--a corollary of no-regret) simultaneously, without knowing the opponent types, but in a smaller class of games than ReDVaLeR . RV 驴(t) effectively ensures the performance of a learner during the process of learning, as opposed to the performance of a learned behavior. We show that the expression for regret of RV 驴(t) can have a slightly better form than those of other comparable algorithms like GIGA and GIGA-WoLF though, contrastingly, our analysis is in continuous time. Moreover, experiments show that RV 驴(t) can converge to an equilibrium in some cases where GIGA, GIGA-WoLF would fail, and to better equilibria where GIGA, GIGA-WoLF converge to undesirable equilibria (coordination games). This important class of coordination games also highlights the key desirability of policy convergence as a criterion for MAL in self-play instead of high average payoffs. To our knowledge, this is also the first successful (guaranteed) attempt at policy convergence of a no-regret algorithm in the Shapley game.