Emergence of safe behaviours with an intrinsic reward

Authors:
Yuri Gavshin;Maarja Kruusmaa
Affiliations:
Tallinn University of Technology, Centre for Biorobotics, Tallinn, Estonia;Tallinn University of Technology, Centre for Biorobotics, Tallinn, Estonia
Venue:
ICAIS'11 Proceedings of the Second international conference on Adaptive and intelligent systems
Year:
2011

Citing 5
Cited 0

Designing Sociable Robots

Designing Sociable Robots
Exploring the predictable

Advances in evolutionary computing
Learning obstacle avoidance with an operant behavior model

Artificial Life
A robot model of the basal ganglia: Behavior and intrinsic processing

Neural Networks
Intrinsic Motivation Systems for Autonomous Mental Development

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores the idea that robots can learn safe behaviors without prior knowledge about its environment nor the task at hand, using intrinsic motivation to reverse actions. Our general idea is that if the robot learns to reverse its actions, all the behaviors that emerge from this principle are intrinsically safe. We validate this idea with experiments to benchmark the performance of obstacle avoidance behavior. We compare our algorithm based on an abstract intrinsic reward with a Q-learning algorithm for obstacle avoidance based on external reward signal. Finally, we demonstrate that safety of learning can be increased further by first training the robot in the simulator using the intrinsic reward and then running the test with the real robot in the real environment. The experimental results show that the performance of the proposed algorithm is on average only 5-10% lower than of the Q-Learning algorithm. A physical robot, using the knowledge obtained in simulation, in real world performs 10% worse than in simulation. However, its performance reaches the same success rate with the physically trained robot after a short learning period. We interpret this as the evidence confirming the hypothesis that our learning algorithm can be used to teach safe behaviors to a robot.