Stability of learning dynamics in two-agent, imperfect-information games

Authors:
John M. Butterworth;Jonathan L. Shapiro
Affiliations:
University of Manchester, Manchester, United Kingdom;University of Manchester, Manchester, United Kingdom
Venue:
Proceedings of the tenth ACM SIGEVO workshop on Foundations of genetic algorithms
Year:
2009

Citing 10
Cited 0

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Challenges in coevolutionary learning: arms-race dynamics, open-endedness, and medicocre stable states

ALIFE Proceedings of the sixth international conference on Artificial life
Multiagent learning using a variable learning rate

Artificial Intelligence
The Lagging Anchor Algorithm: Reinforcement Learning in Two-Player Zero-Sum Games with Imperfect Information

Machine Learning
Does Data-Model Co-evolution Improve Generalization Performance of Evolving Learners?

PPSN V Proceedings of the 5th International Conference on Parallel Problem Solving from Nature
Nash Convergence of Gradient Dynamics in General-Sum Games

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Cycling Co-Evolution Resulting from Genetic Adaptation in Two-Person Zero-Sum Games

Open Systems & Information Dynamics
Cooperative Coevolution: An Architecture for Evolving Coadapted Subcomponents

Evolutionary Computation
AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

One issue in multi-agent co-adaptive learning concerns convergence. When two (or more) agents play a game with different information and different payoffs, the general behaviour tends to be oscillation around a Nash equilibrium. Several algorithms have been proposed to force convergence to mixed-strategy Nash equilibria in imperfect-information games when the agents are aware of their opponent's strategy. We consider the effect on one such algorithm, the lagging anchor algorithm, when each agent must also infer the gradient information from observations, in the infinitesimal time-step limit. Use of an estimated gradient, either by opponent modelling or stochastic gradient ascent, destabilises the algorithm in a region of parameter space. There are two phases of behaviour. If the rate of estimation is low, the Nash equilibrium becomes unstable in the mean. If the rate is high, the Nash equilibrium is an attractive fixed point in the mean, but the uncertainty acts as narrow-band coloured noise, which causes dampened oscillations.