Learning to cooperate via policy search

  • Authors:
  • Leonid Peshkin;Kee-Eung Kim;Nicolas Meuleau;Leslie Pack Kaelbling

  • Affiliations:
  • MIT AI Laboratory, Cambridge, MA;Computer Science Dept., Providence, RI;MIT AI Laboratory, Cambridge, MA;MIT AI Laboratory, Cambridge, MA

  • Venue:
  • UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cooperative games are those in which both agents share the same payoff structure. Value-based reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policy-search method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain.