Online computation and competitive analysis
Online computation and competitive analysis
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Online decision problems with large strategy sets
Online decision problems with large strategy sets
Characterizing truthful multi-armed bandit mechanisms: extended abstract
Proceedings of the 10th ACM conference on Electronic commerce
Policy teaching through reward function learning
Proceedings of the 10th ACM conference on Electronic commerce
Interactively shaping agents via human reinforcement: the TAMER framework
Proceedings of the fifth international conference on Knowledge capture
Value-based policy teaching with active indirect elicitation
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
On partially controlled multi-agent systems
Journal of Artificial Intelligence Research
A general approach to environment design with one agent
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
To teach or not to teach?: decision making under uncertainty in ad hoc teams
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Hi-index | 0.00 |
We consider a setting in which a principal seeks to induce an adaptive agent to select a target action by providing incentives on one or more actions. The agent maintains a belief about the value for each action---which may update based on experience---and selects at each time step the action with the maximal sum of value and associated incentive. The principal observes the agent's selection, but has no information about the agent's current beliefs or belief update process. For inducing the target action as soon as possible, or as often as possible over a fixed time period, it is optimal for a principal with a per-period budget to assign the budget to the target action and wait for the agent to want to make that choice. But with an across-period budget, no algorithm can provide good performance on all instances without knowledge of the agent's update process, except in the particular case in which the goal is to induce the agent to select the target action once. We demonstrate ways to overcome this strong negative result with knowledge about the agent's beliefs, by providing a tractable algorithm for solving the offline problem when the principal has perfect knowledge, and an analytical solution for an instance of the problem in which partial knowledge is available.