AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents

Authors:
Vincent Conitzer;Tuomas Sandholm
Affiliations:
, Computer Science Department, Carnegie Mellon University, Pittsburgh 15213;, Computer Science Department, Carnegie Mellon University, Pittsburgh 15213
Venue:
Machine Learning
Year:
2007

Citing 25
Cited 19

“Evolutionary” selection dynamics in games: convergence and limit properties

International Journal of Game Theory
The weighted majority algorithm

Information and Computation
The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning in multiagent systems

Multiagent systems
A near-optimal polynomial time algorithm for learning in certain classes of stochastic games

Artificial Intelligence
Algorithms, games, and the internet

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Multiagent learning using a variable learning rate

Artificial Intelligence
Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
On No-Regret Learning, Fictitious Play, and Nash Equilibrium

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Polynomial-time reinforcement learning of near-optimal policies

Eighteenth national conference on Artificial intelligence
A polynomial-time nash equilibrium algorithm for repeated games

Proceedings of the 4th ACM conference on Electronic commerce
Gambling in a rigged casino: The adversarial multi-armed bandit problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Communication complexity as a lower bound for learning in games

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Efficient learning equilibrium

Artificial Intelligence
Performance bounded reinforcement learning in strategic interactions

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Simple search methods for finding a Nash equilibrium

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Mixed-integer programming methods for finding Nash equilibria

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Optimal efficient learning equilibrium: imperfect monitoring in symmetric games

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Complexity results about Nash equilibria

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Satisficing and learning cooperation in the prisoner's dilemma

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Fast concurrent reinforcement learners

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Learning against opponents with bounded memory

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Gradient descent for symmetric and asymmetric multiagent reinforcement learning

Web Intelligence and Agent Systems
Perspectives on multiagent learning

Artificial Intelligence
Reaching pareto-optimality in prisoner's dilemma using conditional joint action learning

Autonomous Agents and Multi-Agent Systems
Stability of learning dynamics in two-agent, imperfect-information games

Proceedings of the tenth ACM SIGEVO workshop on Foundations of genetic algorithms
Learning equilibria in repeated congestion games

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Computational aspects of mechanism design

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
A multiagent reinforcement learning algorithm with non-linear dynamics

Journal of Artificial Intelligence Research
Multiagent Reinforcement Learning with Spiking and Non-Spiking Agents in the Iterated Prisoner's Dilemma

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Insuring Risk-Averse Agents

ADT '09 Proceedings of the 1st International Conference on Algorithmic Decision Theory
Approximation guarantees for fictitious play

Allerton'09 Proceedings of the 47th annual Allerton conference on Communication, control, and computing
Using graph analysis to study networks of adaptive agent

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Evolutionary dynamics of regret minimization

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
On the rate of convergence of fictitious play

SAGT'10 Proceedings of the Third international conference on Algorithmic game theory
Empirical evaluation of ad hoc teamwork in the pursuit domain

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Friend or foe?: detecting an opponent's attitude in normal form games

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
An analysis framework for ad hoc teamwork tasks

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Collaborative multi-agent reinforcement learning based on a novel coordination tree frame with dynamic partition

Engineering Applications of Artificial Intelligence
Multiagent learning in the presence of memory-bounded agents

Autonomous Agents and Multi-Agent Systems
Multiagent meta-level control for radar coordination

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two minimal requirements for a satisfactory multiagent learning algorithm are that it 1. learns to play optimally against stationary opponents and 2. converges to a Nash equilibrium in self-play. The previous algorithm that has come closest, WoLF-IGA, has been proven to have these two properties in 2-player 2-action (repeated) games--assuming that the opponent's mixed strategy is observable. Another algorithm, ReDVaLeR (which was introduced after the algorithm described in this paper), achieves the two properties in games with arbitrary numbers of actions and players, but still requires that the opponents' mixed strategies are observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have the two properties in games with arbitrary numbers of actions and players. It is still the only algorithm that does so while only relying on observing the other players' actual actions (not their mixed strategies). It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others' strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. We provide experimental results that suggest that AWESOME converges fast in practice. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing future multiagent learning algorithms as well.