Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning

  • Authors:
  • Jacob W. Crandall;Michael A. Goodrich

  • Affiliations:
  • Computing and Information Science Program, Masdar Institute of Science and Technology, Abu Dhabi, UAE;Computer Science Department, Brigham Young University, Provo, USA

  • Venue:
  • Machine Learning
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of learning in repeated general-sum matrix games when a learning algorithm can observe the actions but not the payoffs of its associates. Due to the non-stationarity of the environment caused by learning associates in these games, most state-of-the-art algorithms perform poorly in some important repeated games due to an inability to make profitable compromises. To make these compromises, an agent must effectively balance competing objectives, including bounding losses, playing optimally with respect to current beliefs, and taking calculated, but profitable, risks. In this paper, we present, discuss, and analyze M-Qubed, a reinforcement learning algorithm designed to overcome these deficiencies by encoding and balancing best-response, cautious, and optimistic learning biases. We show that M-Qubed learns to make profitable compromises across a wide-range of repeated matrix games played with many kinds of learners. Specifically, we prove that M-Qubed's average payoffs meet or exceed its maximin value in the limit. Additionally, we show that, in two-player games, M-Qubed's average payoffs approach the value of the Nash bargaining solution in self play. Furthermore, it performs very well when associating with other learners, as evidenced by its robust behavior in round-robin and evolutionary tournaments of two-player games. These results demonstrate that an agent can learn to make good compromises, and hence receive high payoffs, in repeated games by effectively encoding and balancing best-response, cautious, and optimistic learning biases.