Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions

  • Authors:
  • Minija Tamosiunaite;Tamim Asfour;Florentin Wörgötter

  • Affiliations:
  • University of Göttingen, Bernstein Centre for Computational Neuroscience, Bunsenstr. 10, 37073, Göttingen, Germany and Vytautas Magnus University, Department of Informatics, Vileikos ...;Universität Karlsruhe (TH), Institute for Computer Science and Engineering, Karlsruhe, Germany;University of Göttingen, Bernstein Centre for Computational Neuroscience, Bunsenstr. 10, 37073, Göttingen, Germany

  • Venue:
  • Biological Cybernetics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult.