Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Learning an Agent's Utility Function by Observing Behavior
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Algorithms for Inverse Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Making Rational Decisions Using Adaptive Utility Elicitation
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Proceedings of the 4th ACM conference on Electronic commerce
Hidden-action in multi-hop routing
Proceedings of the 6th ACM conference on Electronic commerce
Preference elicitation for interface optimization
Proceedings of the 18th annual ACM symposium on User interface software and technology
EC '06 Proceedings of the 7th ACM conference on Electronic commerce
Bayesian inverse reinforcement learning
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Incremental utility elicitation with minimax regret decision criterion
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Regret-based utility elicitation in constraint-based decision problems
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Computational challenges in e-commerce
Communications of the ACM - Rural engineering development
Policy teaching through reward function learning
Proceedings of the 10th ACM conference on Electronic commerce
The role of game theory in human computation systems
Proceedings of the ACM SIGKDD Workshop on Human Computation
A general approach to environment design with one agent
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Regret-based reward elicitation for Markov decision processes
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Toward automatic task design: a progress report
Proceedings of the ACM SIGKDD Workshop on Human Computation
Cultivating desired behaviour: policy teaching via environment-dynamics tweaks
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Inducing desirable behaviour through an incentives infrastructure
MATES'10 Proceedings of the 8th German conference on Multiagent system technologies
Incentive design for adaptive agents
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Multiagent environment design in human computation
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Persuading agents to act in the right way: An incentive-based approach
Engineering Applications of Artificial Intelligence
Bayesian interaction shaping: learning to influence strategic interactions in mixed robotic domains
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Hi-index | 0.00 |
Many situations arise in which an interested party's utility is dependent on the actions of an agent; e.g., a teacher is interested in a student learning effectively and a firm is interested in a consumer's behavior. We consider an environment in which the interested party can provide incentives to affect the agent's actions but cannot otherwise enforce actions. In value-based policy teaching, we situate this within the framework of sequential decision tasks modeled by Markov Decision Processes, and seek to associate limited rewards with states that induce the agent to follow a policy that maximizes the total expected value of the interested party. We show value-based policy teaching is NP-hard and provide a mixed integer program formulation. Focusing in particular on environments in which the agent's reward is unknown to the interested party, we provide a method for active indirect elicitation wherein the agent's reward function is inferred from observations about its response to incentives. Experimental results suggest that we can generally find the optimal incentive provision in a small number of elicitation rounds.