Representing and using organizational knowledge in DAI systems
Distributed Artificial Intelligence (Vol. 2)
Gradient descent for general reinforcement learning
Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Learning to Cooperate via Policy Search
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Hi-index | 0.00 |
Policy gradient methods are very useful approaches to reinforcement learning in multi-agent systems. Employing these methods, a decision problem in a multi-agent system can be divided into a set of independent decision problems for each agent by adopting autonomous decentralized control. In addition, these methods use stochastic policies that include parameters. The parameters are updated stochastically to maximize the expectation of the reward. In this paper, first, we consider each decision problem as a problem of minimizing an objective function. We adopt a Boltzman distribution function for the stochastic policy. The objective function is used to represent the energy of the Boltzman distribution function. Next, we show that the objective function can be defined by a state-value function, the sum of weight parameters of state-action rules, and heuristic potentials. Moreover, we apply this method to pursuit problems. Experimental results show that the method used with these objective functions can produce episodes as short as a Q-learning method does, and can easily deal with limitations such as time-window resrictions imposed on the episode length in addition to utilizing heuristic knowledge such as the attractive potential to a target.