Multi-Agent Least-Squares Policy Iteration

  • Authors:
  • Victor Palmer

  • Affiliations:
  • Texas A&M University, College Station, Texas, email: vpalmer@cs.tamu.edu

  • Venue:
  • Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Least-Squares Policy Iteration [3] is an approximate reinforcement learning technique capable of training policies over large, continuous state spaces. Unfortunately, the computational requirements of LSPI scale poorly with the number of system agents. Work has been done to address this problem, such as the Coordinated Reinforcement Learning (CRL) approach of Guestrin, et al [1], but this requires that one have prior information about the learning system such as knowing interagent dependencies and the form of the Q-function. We demonstrate a hybrid gradient-ascent/LSPI approach which is capable of using LSPI to efficiently train multi-agent policies. Our approach has computational requirements which scale as O(N), where N is the number of system agents, and does not have the prior knowledge requirements of CRL. Finally, we demonstrate our algorithm on a standard multi-agent network control problem [1].