Simplifying Optimal Strategies in Stochastic Games

  • Authors:
  • J. Flesch;F. Thuijsman;O. J. Vrieze

  • Affiliations:
  • -;-;-

  • Venue:
  • SIAM Journal on Control and Optimization
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

We deal with zero-sum limiting average stochastic games. We show that the existence of arbitrary optimal strategies implies the existence of stationary $\varepsilon $-optimal strategies, for all $\varepsilon 0$, and the existence of Markov optimal strategies. We present such a construction for which we do not even need to know these optimal strategies. Furthermore, an example demonstrates that the existence of stationary optimal strategies is not implied by the existence of optimal strategies, so the result is sharp.More generally, one can evaluate a strategy $\pi $ for the maximizing player, player 1, by the reward $\phi _s(\pi )$ that $\pi $ guarantees to him when starting in state s. A strategy $\pi $ is called nonimproving if $\phi _s(\pi )\geq \phi _s(\pi [h])$ for all s and for all finite histories h with final state s, where $\pi [h]$ is the strategy $\pi $ conditional on the history h. Using the evaluation $\phi $, we may define the relation "$\varepsilon $-better" between strategies. A strategy $\pi ^1 $ is called $\varepsilon $-better than $\pi ^2$ if $\phi _s(\pi ^1)\geq \phi _s(\pi ^2)-\varepsilon $ for all $s$. We show that for any nonimproving strategy $\pi $, for all $\varepsilon 0$, there exists an $% \varepsilon $-better stationary strategy and a (0-)better Markov strategy as well. Since all optimal strategies are nonimproving, this result can be regarded as a generalization of the above result for optimal strategies.Finally, we briefly discuss some other extensions. Among others, we indicate possible simplifications of strategies that are only optimal for particular initial states by "almost stationary" $\varepsilon $-optimal strategies, for all $\varepsilon 0$, and by "almost Markov" optimal strategies. We also discuss the validity of the above results for other reward functions. Several examples clarify these issues.