Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Least-squares policy iteration
The Journal of Machine Learning Research
Hi-index | 0.00 |
Least-Squares Policy Iteration [3] is an approximate reinforcement learning technique capable of training policies over large, continuous state spaces. Unfortunately, the computational requirements of LSPI scale poorly with the number of system agents. Work has been done to address this problem, such as the Coordinated Reinforcement Learning (CRL) approach of Guestrin, et al [1], but this requires that one have prior information about the learning system such as knowing interagent dependencies and the form of the Q-function. We demonstrate a hybrid gradient-ascent/LSPI approach which is capable of using LSPI to efficiently train multi-agent policies. Our approach has computational requirements which scale as O(N), where N is the number of system agents, and does not have the prior knowledge requirements of CRL. Finally, we demonstrate our algorithm on a standard multi-agent network control problem [1].