Convergent Gradient Ascent in General-Sum Games

Authors:
Bikramjit Banerjee;Jing Peng
Affiliations:
-;-
Venue:
ECML '02 Proceedings of the 13th European Conference on Machine Learning
Year:
2002

Citing 5
Cited 3

The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Multiagent learning using a variable learning rate

Artificial Intelligence
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Fast concurrent reinforcement learners

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Adaptive policy gradient in multiagent learning

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Multiagent Reinforcement Learning with Spiking and Non-Spiking Agents in the Iterated Prisoner's Dilemma

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Exploiting based pre-testing in competition environment

PRIMA'06 Proceedings of the 9th Pacific Rim international conference on Agent Computing and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we look at the recent results in policy gradient learning in a general-sum game scenario, in the form of two algorithms, IGA and WoLF-IGA. We address the drawbacks in convergence properties of these algorithms, and propose a more accurate version of WoLF-IGA that is guaranteed to converge to Nash Equilibrium policies in self-play (or against an IGA learner). We also present a control theoretic interpretation of variable learning rate which not only justifies WoLF-IGA, but also shows it to achieve fastest convergence under some constraints. Finally we derive optimal learning rates for fastest convergence in practical simulations.