Robust Modified Policy Iteration

  • Authors:
  • David L. Kaufman;Andrew J. Schaefer

  • Affiliations:
  • Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109;Department of Industrial Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261

  • Venue:
  • INFORMS Journal on Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Robust dynamic programming robust DP mitigates the effects of ambiguity in transition probabilities on the solutions of Markov decision problems. We consider the computation of robust DP solutions for discrete-stage, infinite-horizon, discounted problems with finite state and action spaces. We present robust modified policy iteration RMPI and demonstrate its convergence. RMPI encompasses both of the previously known algorithms, robust value iteration and robust policy iteration. In addition to proposing exact RMPI, in which the “inner problem” is solved precisely, we propose inexact RMPI, in which the inner problem is solved to within a specified tolerance. We also introduce new stopping criteria based on the span seminorm. Finally, we demonstrate through some numerical studies that RMPI can significantly reduce computation time.