Emergence of safe behaviours with an intrinsic reward

  • Authors:
  • Yuri Gavshin;Maarja Kruusmaa

  • Affiliations:
  • Tallinn University of Technology, Centre for Biorobotics, Tallinn, Estonia;Tallinn University of Technology, Centre for Biorobotics, Tallinn, Estonia

  • Venue:
  • ICAIS'11 Proceedings of the Second international conference on Adaptive and intelligent systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper explores the idea that robots can learn safe behaviors without prior knowledge about its environment nor the task at hand, using intrinsic motivation to reverse actions. Our general idea is that if the robot learns to reverse its actions, all the behaviors that emerge from this principle are intrinsically safe. We validate this idea with experiments to benchmark the performance of obstacle avoidance behavior. We compare our algorithm based on an abstract intrinsic reward with a Q-learning algorithm for obstacle avoidance based on external reward signal. Finally, we demonstrate that safety of learning can be increased further by first training the robot in the simulator using the intrinsic reward and then running the test with the real robot in the real environment. The experimental results show that the performance of the proposed algorithm is on average only 5-10% lower than of the Q-Learning algorithm. A physical robot, using the knowledge obtained in simulation, in real world performs 10% worse than in simulation. However, its performance reaches the same success rate with the physically trained robot after a short learning period. We interpret this as the evidence confirming the hypothesis that our learning algorithm can be used to teach safe behaviors to a robot.