Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Learning an Agent's Utility Function by Observing Behavior
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Algorithms for Inverse Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Proceedings of the 4th ACM conference on Electronic commerce
Solving convex programs by random walks
Journal of the ACM (JACM)
Hidden-action in multi-hop routing
Proceedings of the 6th ACM conference on Electronic commerce
EC '06 Proceedings of the 7th ACM conference on Electronic commerce
Approximating the centroid is hard
SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Value-based policy teaching with active indirect elicitation
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
A general approach to environment design with one agent
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Game-theoretic aspects of designing hyperlink structures
WINE'06 Proceedings of the Second international conference on Internet and Network Economics
Mixed strategies in combinatorial agency
WINE'06 Proceedings of the Second international conference on Internet and Network Economics
The role of game theory in human computation systems
Proceedings of the ACM SIGKDD Workshop on Human Computation
A general approach to environment design with one agent
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Toward automatic task design: a progress report
Proceedings of the ACM SIGKDD Workshop on Human Computation
Cultivating desired behaviour: policy teaching via environment-dynamics tweaks
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Incentive design for adaptive agents
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Steering user behavior with badges
Proceedings of the 22nd international conference on World Wide Web
Hi-index | 0.00 |
Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent's decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent's reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad-network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling strategic agents.