Sampled fictitious play for approximate dynamic programming

Authors:
Marina Epelman;Archis Ghate;Robert L. Smith
Affiliations:
Industrial and Operations Engineering, University of Michigan, Ann Arbor, USA;Industrial and Systems Engineering, Box 352650, University of Washington, Seattle, WA 98195, USA;Industrial and Operations Engineering, University of Michigan, Ann Arbor, USA
Venue:
Computers and Operations Research
Year:
2011

Citing 19
Cited 1

Technical Note: \cal Q-Learning

Machine Learning
Dynamic Programming and Optimal Control, Two Volume Set

Dynamic Programming and Optimal Control, Two Volume Set
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
A Genetic Algorithm for the Multidimensional Knapsack Problem

Journal of Heuristics
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Open Theoretical Questions in Reinforcement Learning

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
On the convergence of optimistic policy iteration

The Journal of Machine Learning Research
An Adaptive Sampling Algorithm for Solving Markov Decision Processes

Operations Research
A Fictitious Play Approach to Large-Scale Optimization

Operations Research
Simulation-based Algorithms for Markov Decision Processes (Communications and Control Engineering)

Simulation-based Algorithms for Markov Decision Processes (Communications and Control Engineering)
Markov chains, game theory, and infinite programming: three paradigms for optimization of complex systems

Markov chains, game theory, and infinite programming: three paradigms for optimization of complex systems
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)

Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
A Decentralized Approach to Discrete Optimization via Simulation: Application to Network Flow

Operations Research
A Game-Theoretic Approach to Efficient Power Management in Sensor Networks

Operations Research
Payoff-Based Dynamics for Multiplayer Weakly Acyclic Games

SIAM Journal on Control and Optimization
CoSIGN: A Parallel Algorithm for Coordinated Traffic Signal Control

IEEE Transactions on Intelligent Transportation Systems
Finite time analysis of the pursuit algorithm for learning automata

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Survey A survey of computational complexity results in systems and control

Automatica (Journal of IFAC)

A sampled fictitious play based learning algorithm for infinite horizon Markov decision processes

Proceedings of the Winter Simulation Conference

Quantified Score

Hi-index	0.01

Visualization

Abstract

Sampled fictitious play (SFP) is a recently proposed iterative learning mechanism for computing Nash equilibria of non-cooperative games. For games of identical interests, every limit point of the sequence of mixed strategies induced by the empirical frequencies of best response actions that players in SFP play is a Nash equilibrium. Because discrete optimization problems can be viewed as games of identical interests wherein Nash equilibria define a type of local optimum, SFP has recently been employed as a heuristic optimization algorithm with promising empirical performance. However, there have been no guarantees of convergence to a globally optimal Nash equilibrium established for any of the problem classes considered to date. In this paper, we introduce a variant of SFP and show that it converges almost surely to optimal policies in model-free, finite-horizon stochastic dynamic programs. The key idea is to view the dynamic programming states as players, whose common interest is to maximize the total multi-period expected reward starting in a fixed initial state. We also offer empirical results suggesting that our SFP variant is effective in practice for small to moderate sized model-free problems.