Incentive design for adaptive agents

Authors:
Yiling Chen;Jerry Kung;David C. Parkes;Ariel D. Procaccia;Haoqi Zhang
Affiliations:
Harvard University, Cambridge, MA;Harvard University, Cambridge, MA;Harvard University, Cambridge, MA;Harvard University, Cambridge, MA;Harvard University, Cambridge, MA
Venue:
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Year:
2011

Citing 10
Cited 0

Online computation and competitive analysis

Online computation and competitive analysis
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Online decision problems with large strategy sets

Online decision problems with large strategy sets
Characterizing truthful multi-armed bandit mechanisms: extended abstract

Proceedings of the 10th ACM conference on Electronic commerce
Policy teaching through reward function learning

Proceedings of the 10th ACM conference on Electronic commerce
Interactively shaping agents via human reinforcement: the TAMER framework

Proceedings of the fifth international conference on Knowledge capture
Value-based policy teaching with active indirect elicitation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
On partially controlled multi-agent systems

Journal of Artificial Intelligence Research
A general approach to environment design with one agent

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
To teach or not to teach?: decision making under uncertainty in ad hoc teams

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider a setting in which a principal seeks to induce an adaptive agent to select a target action by providing incentives on one or more actions. The agent maintains a belief about the value for each action---which may update based on experience---and selects at each time step the action with the maximal sum of value and associated incentive. The principal observes the agent's selection, but has no information about the agent's current beliefs or belief update process. For inducing the target action as soon as possible, or as often as possible over a fixed time period, it is optimal for a principal with a per-period budget to assign the budget to the target action and wait for the agent to want to make that choice. But with an across-period budget, no algorithm can provide good performance on all instances without knowledge of the agent's update process, except in the particular case in which the goal is to induce the agent to select the target action once. We demonstrate ways to overcome this strong negative result with knowledge about the agent's beliefs, by providing a tractable algorithm for solving the offline problem when the principal has perfect knowledge, and an analytical solution for an instance of the problem in which partial knowledge is available.