A sampled fictitious play based learning algorithm for infinite horizon Markov decision processes

  • Authors:
  • Esra Sisikoglu;Marina A. Epelman;Robert L. Smith

  • Affiliations:
  • The University of Missouri, Columbia, MO;The University of Michigan, Arbor, MI;The University of Michigan, Ann Arbor, MI

  • Venue:
  • Proceedings of the Winter Simulation Conference
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Using Sampled Fictitious Play (SFP) concepts, we develop SFPL: Sampled Fictitious Play Learning --- a learning algorithm for solving discounted homogeneous Markov Decision Problems where the transition probabilities are unknown and need to be learned via simulation or direct observation of the system in real time. Thus, SFPL simultaneously updates the estimates of the unknown transition probabilities and the estimates of optimal value and optimal action in the observed state. In the spirit of SFP, the action after each transition is selected by sampling from the empirical distribution of previous optimal action estimates for the current state. The resulting algorithm is provably convergent. We compare its performance with other learning methods, including SARSA and Q-learning.