Cultivating desired behaviour: policy teaching via environment-dynamics tweaks

  • Authors:
  • Zinovi Rabinovich;Lachlan Dufton;Kate Larson;Nicholas R. Jennings

  • Affiliations:
  • University of Southampton, United Kingdom;University of Waterloo, Canada;University of Waterloo, Canada;University of Southampton, United Kingdom

  • Venue:
  • Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we study, for the first time explicitly, the implications of endowing an interested party (i.e. a teacher) with the ability to modify the underlying dynamics of the environment, in order to encourage an agent to learn to follow a specific policy. We introduce a cost function which can be used by the teacher to balance the modifications it makes to the underlying environment dynamics, with the learner's performance compared to some ideal, desired, policy. We formulate teacher's problem of determining optimal environment changes as a planning and control problem, and empirically validate the effectiveness of our model.