Parallel reinforcement learning with linear function approximation

Authors:
Matthew Grounds;Daniel Kudenko
Affiliations:
Department of Computer Science, University of York, York, UK;Department of Computer Science, University of York, York, UK
Venue:
ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
Year:
2005

Citing 9
Cited 0

Parallel dynamic programming

Advances in parallel algorithms
Parallel programming with MPI

Parallel programming with MPI
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
P3VI: a partitioned, prioritized, parallel value iterator

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Cooperative learning using advice exchange

Adaptive agents and multi-agent systems
A complexity analysis of cooperative mechanisms in reinforcement learning

AAAI'91 Proceedings of the ninth National conference on Artificial intelligence - Volume 2
Expertness based cooperative Q-learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigate the use of parallelization in reinforcement learning (RL), with the goal of learning optimal policies for single-agent RL problems more quickly by using parallel hardware. Our approach is based on agents using the SARSA(λ) algorithm, with value functions represented using linear function approximators. In our proposed method, each agent learns independently in a separate simulation of the single-agent problem. The agents periodically exchange information extracted from the weights of their approximators, accelerating convergence towards the optimal policy. We develop three increasingly efficient versions of this approach to parallel RL, and present empirical results for an implementation of the methods on a Beowulf cluster.