Improving optimality of neural rewards regression for data-efficient batch near-optimal policy identification

Authors:
Daniel Schneegaß;Steffen Udluft;Thomas Martinetz
Affiliations:
Information & Communications, Learning Systems, Siemens AG, Corporate Technology, Munich, Germany and Institute for Neuro-and Bioinformatics, University at Lübeck, Lübeck, Germany;Information & Communications, Learning Systems, Siemens AG, Corporate Technology, Munich, Germany;Institute for Neuro-and Bioinformatics, University at Lübeck, Lübeck, Germany
Venue:
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Year:
2007

Citing 9
Cited 1

Distributed representations

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Neural networks and the bias/variance dilemma

Neural Computation
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
How to Train Neural Networks

Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop
Least-squares policy iteration

The Journal of Machine Learning Research
Kernel rewards regression: an information efficient batch policy iteration approach

AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning
Gradient calculations for dynamic recurrent neural networks: a survey

IEEE Transactions on Neural Networks

Kalman temporal differences

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present two substantial extensions of Neural Rewards Regression (NRR) [1]. In order to give a less biased estimator of the Bellman Residual and to facilitate the regression character of NRR, we incorporate an improved, Auxiliared Bellman Residual [2] and provide, to the best of our knowledge, the first Neural Network based implementation of the novel Bellman Residual minimisation technique. Furthermore, we extend NRR to Policy Gradient Neural Rewards Regression (PGNRR), where the strategy is directly encoded by a policy network. PGNRR profits from both the data-efficiency of the Rewards Regression approach and the directness of policy search methods. PGNRR further overcomes a crucial drawback of NRR as it extends the accordant problem class considerably by the applicability of continuous action spaces.