Multi-Agent Least-Squares Policy Iteration

Authors:
Victor Palmer
Affiliations:
Texas A&M University, College Station, Texas, email: vpalmer@cs.tamu.edu
Venue:
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Year:
2006

Citing 3
Cited 0

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Least-squares policy iteration

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Least-Squares Policy Iteration [3] is an approximate reinforcement learning technique capable of training policies over large, continuous state spaces. Unfortunately, the computational requirements of LSPI scale poorly with the number of system agents. Work has been done to address this problem, such as the Coordinated Reinforcement Learning (CRL) approach of Guestrin, et al [1], but this requires that one have prior information about the learning system such as knowing interagent dependencies and the form of the Q-function. We demonstrate a hybrid gradient-ascent/LSPI approach which is capable of using LSPI to efficiently train multi-agent policies. Our approach has computational requirements which scale as O(N), where N is the number of system agents, and does not have the prior knowledge requirements of CRL. Finally, we demonstrate our algorithm on a standard multi-agent network control problem [1].