Solving deep memory POMDPs with recurrent policy gradients

  • Authors:
  • Daan Wierstra;Alexander Foerster;Jan Peters;Jürgen Schmidhuber

  • Affiliations:
  • IDSIA, Manno-Lugano, Switzerland;IDSIA, Manno-Lugano, Switzerland;University of Southern California, Los Angeles, CA;IDSIA, Manno-Lugano, Switzerland

  • Venue:
  • ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a "Long Short-Term Memory" architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task.