Policy gradient methods in multi-agent systems: pursuit problem

  • Authors:
  • Seiji Ishihara;Harukazu Igarashi

  • Affiliations:
  • School of Engineering, Kinki University, 1 Takayaumenobe, Higashi-Hiroshima-shi, 739-2116 Japan;School of Engineering, Kinki University, 1 Takayaumenobe, Higashi-Hiroshima-shi, 739-2116 Japan

  • Venue:
  • Design and application of hybrid intelligent systems
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Policy gradient methods are very useful approaches to reinforcement learning in multi-agent systems. Employing these methods, a decision problem in a multi-agent system can be divided into a set of independent decision problems for each agent by adopting autonomous decentralized control. In addition, these methods use stochastic policies that include parameters. The parameters are updated stochastically to maximize the expectation of the reward. In this paper, first, we consider each decision problem as a problem of minimizing an objective function. We adopt a Boltzman distribution function for the stochastic policy. The objective function is used to represent the energy of the Boltzman distribution function. Next, we show that the objective function can be defined by a state-value function, the sum of weight parameters of state-action rules, and heuristic potentials. Moreover, we apply this method to pursuit problems. Experimental results show that the method used with these objective functions can produce episodes as short as a Q-learning method does, and can easily deal with limitations such as time-window resrictions imposed on the episode length in addition to utilizing heuristic knowledge such as the attractive potential to a target.