Anticipating Rewards in Continuous Time and Space: A Case Study in Developmental Robotics

  • Authors:
  • Arnaud J. Blanchard;Lola Cañamero

  • Affiliations:
  • Adaptive System Research Group, School of Computer Science, University of Hertfordshire, College Lane, Hatfield, Herts AL10 9AB, UK;Adaptive System Research Group, School of Computer Science, University of Hertfordshire, College Lane, Hatfield, Herts AL10 9AB, UK

  • Venue:
  • Anticipatory Behavior in Adaptive Learning Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the first basic principles, implementation and experimental results of what could be regarded as a new approach to reinforcement learning, where agents--physical robots interacting with objects and other agents in the real world--can learn to anticipate rewards using their sensory inputs. Our approach does not need discretization, notion of events, or classification, and instead of learning rewards for the different possible actions of an agent in all the situations, we propose to make agents learn only the main situations worth avoiding and reaching. However, the main focus of our work is not reinforcement learning as such, but modeling cognitive development on a small autonomous robot interacting with an "adult" caretaker, typically a human, in the real world; the control architecture follows a Perception-Action approach incorporating a basic homeostatic principle. This interaction occurs in very close proximity, uses very coarse and limited sensory-motor capabilities, and affects the "well-being" and affective state of the robot. The type of anticipatory behavior we are concerned with in this context relates to both sensory and reward anticipation. We have applied and tested our model on a real robot.