Policy teaching through reward function learning
Proceedings of the 10th ACM conference on Electronic commerce
Dynamic Supplier Contracts Under Asymmetric Inventory Information
Operations Research
Optimal Selling Scheme for Heterogeneous Consumers with Uncertain Valuations
Mathematics of Operations Research
Analysis of a Dynamic Adverse Selection Model with Asymptotic Efficiency
Mathematics of Operations Research
Solving an Infinite Horizon Adverse Selection Model Through Finite Policy Graphs
Operations Research
Hi-index | 0.00 |
This paper proposes a general framework for a large class of multiperiod principal-agent problems. In this framework, a principal has a primary stake in the performance of a system but delegates its control to an agent. The underlying system is a Markov decision process, where the state of the system can only be observed by the agent but the agent's action is observed by both parties. This paper develops a dynamic programming algorithm to derive optimal long-term contracts for the principal. The principal indirectly controls the underlying system by offering the agent a menu of continuation utility vectors along public information paths; the agent's best response, expressed in his choice of continuation utilities, induces truthful state revelation and results in actions that maximize the principal's expected payoff. This problem is meaningful to the operations research community because it can be framed as the problem of optimally designing the reward structure of a Markov decision process with hidden states and has many applications of interest as discussed in this paper.