An improved policy iteration algorithm for partially observable MDPs
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Dynamic Supplier Contracts Under Asymmetric Inventory Information
Operations Research
Analysis of a Dynamic Adverse Selection Model with Asymptotic Efficiency
Mathematics of Operations Research
Analysis of a Dynamic Adverse Selection Model with Asymptotic Efficiency
Mathematics of Operations Research
Hi-index | 0.00 |
This paper studies an infinite horizon adverse selection model with an underlying Markov information process. It introduces a graphic representation of continuation contracts and continuation payoff frontiers, namely finite policy graph, and provides an algorithm to approximate the optimal policy graph through iterations. The algorithm performs an additional step after each value iteration---replacing dominated points on the previous continuation payoff frontier by points on the new frontier and reevaluating the new frontier. This dominance-free reevaluation step accelerates the convergence of the continuation payoff frontiers. Numerical examples demonstrate the effectiveness of this algorithm and properties of the optimal contracts.