Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions

Authors:
Minija Tamosiunaite;Tamim Asfour;Florentin Wörgötter
Affiliations:
University of Göttingen, Bernstein Centre for Computational Neuroscience, Bunsenstr. 10, 37073, Göttingen, Germany and Vytautas Magnus University, Department of Informatics, Vileikos ...;Universität Karlsruhe (TH), Institute for Computer Science and Engineering, Karlsruhe, Germany;University of Göttingen, Bernstein Centre for Computational Neuroscience, Bunsenstr. 10, 37073, Göttingen, Germany
Venue:
Biological Cybernetics
Year:
2009

Citing 0
Cited 3

Teaching a robot to perform tasks with voice commands

MICAI'10 Proceedings of the 9th Mexican international conference on Advances in artificial intelligence: Part I
Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives

Robotics and Autonomous Systems
Motivated learning in computational models of consciousness

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult.