A multiagent reinforcement learning algorithm using extended optimal response

  • Authors:
  • Nobuo Suematsu;Akira Hayashi

  • Affiliations:
  • Hiroshima City University, Hiroshima, Japan;Hiroshima City University, Hiroshima, Japan

  • Venue:
  • Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Stochastic games provides a theoretical framework to multiagent reinforcement learning. Based on the framework, a multiagent reinforcement learning algorithm for zero-sum stochastic games was proposed by Littman and it was extended to general-sum games by Hu and Wellman. Given a stochastic game, if all agents learn with their algorithm, we can expect that the policies of the agents converge to a Nash equilibrium. However, agents with their algorithm always try to converge to a Nash equilibrium independent of the policies used by the other agents. In addition, in case there are multiple Nash equilibria, agents must agree on the equilibrium where they want to reach. Thus, their algorithm lacks adaptability in a sense. In this paper, we propose a multiagent reinforcement learning algorithm. The algorithm uses the extended optimal response which we introduce in this paper. It will converge to a Nash equilibrium when other agents are adaptable, otherwise it will make an optimal response. We also provide some empirical results in three simple stochastic games, which show that the algorithm can realize what we intend.