Improving optimality of neural rewards regression for data-efficient batch near-optimal policy identification

  • Authors:
  • Daniel Schneegaß;Steffen Udluft;Thomas Martinetz

  • Affiliations:
  • Information & Communications, Learning Systems, Siemens AG, Corporate Technology, Munich, Germany and Institute for Neuro-and Bioinformatics, University at Lübeck, Lübeck, Germany;Information & Communications, Learning Systems, Siemens AG, Corporate Technology, Munich, Germany;Institute for Neuro-and Bioinformatics, University at Lübeck, Lübeck, Germany

  • Venue:
  • ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present two substantial extensions of Neural Rewards Regression (NRR) [1]. In order to give a less biased estimator of the Bellman Residual and to facilitate the regression character of NRR, we incorporate an improved, Auxiliared Bellman Residual [2] and provide, to the best of our knowledge, the first Neural Network based implementation of the novel Bellman Residual minimisation technique. Furthermore, we extend NRR to Policy Gradient Neural Rewards Regression (PGNRR), where the strategy is directly encoded by a policy network. PGNRR profits from both the data-efficiency of the Rewards Regression approach and the directness of policy search methods. PGNRR further overcomes a crucial drawback of NRR as it extends the accordant problem class considerably by the applicability of continuous action spaces.