Cultivating desired behaviour: policy teaching via environment-dynamics tweaks

Authors:
Zinovi Rabinovich;Lachlan Dufton;Kate Larson;Nicholas R. Jennings
Affiliations:
University of Southampton, United Kingdom;University of Waterloo, Canada;University of Waterloo, Canada;University of Southampton, United Kingdom
Venue:
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Year:
2010

Citing 16
Cited 2

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Variance-Penalized Reinforcement Learning for Risk-Averse Asset Allocation

IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
Policy Iteration for Factored MDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Learning and value function approximation in complex decision processes

Learning and value function approximation in complex decision processes
Least-squares policy iteration

The Journal of Machine Learning Research
Efficient learning of multi-step best response

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
A survey of robot learning from demonstration

Robotics and Autonomous Systems
Autonomous inter-task transfer in reinforcement learning domains

Autonomous inter-task transfer in reinforcement learning domains
Policy teaching through reward function learning

Proceedings of the 10th ACM conference on Electronic commerce
Value-based policy teaching with active indirect elicitation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
A general approach to environment design with one agent

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Least absolute policy iteration for robust value function approximation

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
The Kullback-Leibler divergence rate between Markov sources

IEEE Transactions on Information Theory

Using incentive mechanisms for an adaptive regulation of open multi-agent systems

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Persuading agents to act in the right way: An incentive-based approach

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we study, for the first time explicitly, the implications of endowing an interested party (i.e. a teacher) with the ability to modify the underlying dynamics of the environment, in order to encourage an agent to learn to follow a specific policy. We introduce a cost function which can be used by the teacher to balance the modifications it makes to the underlying environment dynamics, with the learner's performance compared to some ideal, desired, policy. We formulate teacher's problem of determining optimal environment changes as a planning and control problem, and empirically validate the effectiveness of our model.