Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning

Authors:
Jacob W. Crandall;Michael A. Goodrich
Affiliations:
Computing and Information Science Program, Masdar Institute of Science and Technology, Abu Dhabi, UAE;Computer Science Department, Brigham Young University, Provo, USA
Venue:
Machine Learning
Year:
2011

Citing 0
Cited 3

Rewards for pairs of Q-learning agents conducive to turn-taking in medium-access games

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Just add Pepper: extending learning algorithms for repeated matrix games to repeated Markov games

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Partially decentralized reinforcement learning in finite, multi-agent Markov decision processes

AI Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of learning in repeated general-sum matrix games when a learning algorithm can observe the actions but not the payoffs of its associates. Due to the non-stationarity of the environment caused by learning associates in these games, most state-of-the-art algorithms perform poorly in some important repeated games due to an inability to make profitable compromises. To make these compromises, an agent must effectively balance competing objectives, including bounding losses, playing optimally with respect to current beliefs, and taking calculated, but profitable, risks. In this paper, we present, discuss, and analyze M-Qubed, a reinforcement learning algorithm designed to overcome these deficiencies by encoding and balancing best-response, cautious, and optimistic learning biases. We show that M-Qubed learns to make profitable compromises across a wide-range of repeated matrix games played with many kinds of learners. Specifically, we prove that M-Qubed's average payoffs meet or exceed its maximin value in the limit. Additionally, we show that, in two-player games, M-Qubed's average payoffs approach the value of the Nash bargaining solution in self play. Furthermore, it performs very well when associating with other learners, as evidenced by its robust behavior in round-robin and evolutionary tournaments of two-player games. These results demonstrate that an agent can learn to make good compromises, and hence receive high payoffs, in repeated games by effectively encoding and balancing best-response, cautious, and optimistic learning biases.