Policy teaching through reward function learning

Authors:
Haoqi Zhang;David C. Parkes;Yiling Chen
Affiliations:
Harvard University, Cambridge, MA, USA;Harvard University, Cambridge, MA, USA;Harvard University, Cambridge, MA, USA
Venue:
Proceedings of the 10th ACM conference on Electronic commerce
Year:
2009

Citing 13
Cited 6

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Learning an Agent's Utility Function by Observing Behavior

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
k-Implementation

Proceedings of the 4th ACM conference on Electronic commerce
Solving convex programs by random walks

Journal of the ACM (JACM)
Hidden-action in multi-hop routing

Proceedings of the 6th ACM conference on Electronic commerce
Combinatorial agency

EC '06 Proceedings of the 7th ACM conference on Electronic commerce
Approximating the centroid is hard

SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
A Dynamic Principal-Agent Model with Hidden Information: Sequential Optimality Through Truthful State Revelation

Operations Research
Value-based policy teaching with active indirect elicitation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
A general approach to environment design with one agent

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Game-theoretic aspects of designing hyperlink structures

WINE'06 Proceedings of the Second international conference on Internet and Network Economics
Mixed strategies in combinatorial agency

WINE'06 Proceedings of the Second international conference on Internet and Network Economics

The role of game theory in human computation systems

Proceedings of the ACM SIGKDD Workshop on Human Computation
A general approach to environment design with one agent

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Toward automatic task design: a progress report

Proceedings of the ACM SIGKDD Workshop on Human Computation
Cultivating desired behaviour: policy teaching via environment-dynamics tweaks

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Incentive design for adaptive agents

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Steering user behavior with badges

Proceedings of the 22nd international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad-network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.