Value function approximation in zero-sum markov games

Authors:
Michail G. Lagoudakis;Ronald Parr
Affiliations:
Department of Computer Science, Duke University, Durham, NC;Department of Computer Science, Duke University, Durham, NC
Venue:
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Year:
2002

Citing 8
Cited 4

Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
A unified analysis of value-function-based reinforcement learning algorithms

Neural Computation
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Quitting Games

Mathematics of Operations Research
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Policy Iteration for Factored MDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Learning and value function approximation in complex decision processes

Learning and value function approximation in complex decision processes
Rational and convergent learning in stochastic games

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

A robust Markov game controller for nonlinear systems

Applied Soft Computing
QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games

Neurocomputing
Review article: Synergizing reinforcement learning and game theory-A new direction for control

Applied Soft Computing
Character animation in two-player adversarial games

ACM Transactions on Graphics (TOG)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates value function approximation in the context of zero-sum Markov games, which can be viewed as a generalization of the Markov decision process (MDP) framework to the two-agent case. We generalize error bounds from MDPs to Markov games and describe generalizations of reinforcement learning algorithms to Markov games. We present a generalization of the optimal stopping problem to a two-player simultaneous move Markov game. For this special problem, we provide stronger bounds and can guarantee convergence for LSTD and temporal difference learning with linear value function approximation. We demonstrate the viability of value function approximation for Markov games by using the Least squares policy iteration (LSPI) algorithm to learn good policies for a soccer domain and a flow control problem.