Value-based policy teaching with active indirect elicitation

Authors:
Haoqi Zhang;David Parkes
Affiliations:
School of Engineering and Applied Sciences, Harvard University, Cambridge, MA;School of Engineering and Applied Sciences, Harvard University, Cambridge, MA
Venue:
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Year:
2008

Citing 13
Cited 12

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Learning an Agent's Utility Function by Observing Behavior

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Making Rational Decisions Using Adaptive Utility Elicitation

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
k-Implementation

Proceedings of the 4th ACM conference on Electronic commerce
Hidden-action in multi-hop routing

Proceedings of the 6th ACM conference on Electronic commerce
Preference elicitation for interface optimization

Proceedings of the 18th annual ACM symposium on User interface software and technology
Combinatorial agency

EC '06 Proceedings of the 7th ACM conference on Electronic commerce
Bayesian inverse reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Incremental utility elicitation with minimax regret decision criterion

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Regret-based utility elicitation in constraint-based decision problems

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Computational challenges in e-commerce

Communications of the ACM - Rural engineering development
Policy teaching through reward function learning

Proceedings of the 10th ACM conference on Electronic commerce
The role of game theory in human computation systems

Proceedings of the ACM SIGKDD Workshop on Human Computation
A general approach to environment design with one agent

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Regret-based reward elicitation for Markov decision processes

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Toward automatic task design: a progress report

Proceedings of the ACM SIGKDD Workshop on Human Computation
Cultivating desired behaviour: policy teaching via environment-dynamics tweaks

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Inducing desirable behaviour through an incentives infrastructure

MATES'10 Proceedings of the 8th German conference on Multiagent system technologies
Incentive design for adaptive agents

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Multiagent environment design in human computation

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Persuading agents to act in the right way: An incentive-based approach

Engineering Applications of Artificial Intelligence
Bayesian interaction shaping: learning to influence strategic interactions in mixed robotic domains

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many situations arise in which an interested party's utility is dependent on the actions of an agent; e.g., a teacher is interested in a student learning effectively and a firm is interested in a consumer's behavior. We consider an environment in which the interested party can provide incentives to affect the agent's actions but cannot otherwise enforce actions. In value-based policy teaching, we situate this within the framework of sequential decision tasks modeled by Markov Decision Processes, and seek to associate limited rewards with states that induce the agent to follow a policy that maximizes the total expected value of the interested party. We show value-based policy teaching is NP-hard and provide a mixed integer program formulation. Focusing in particular on environments in which the agent's reward is unknown to the interested party, we provide a method for active indirect elicitation wherein the agent's reward function is inferred from observations about its response to incentives. Experimental results suggest that we can generally find the optimal incentive provision in a small number of elicitation rounds.