Adaptive policy gradient in multiagent learning

Authors:
Bikramjit Banerjee;Jing Peng
Affiliations:
Tulane University, New Orleans, LA;Tulane University, New Orleans, LA
Venue:
AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Year:
2003

Citing 8
Cited 5

The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Multiagent learning using a variable learning rate

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Convergent Gradient Ascent in General-Sum Games

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Learning in dynamic noncooperative multiagent systems

Learning in dynamic noncooperative multiagent systems
Fast concurrent reinforcement learners

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games

Autonomous Agents and Multi-Agent Systems
Learning the task allocation game

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
COOPERATIVE LEARNING BY POLICY-SHARING IN MULTIPLE AGENTS

Cybernetics and Systems
A multiagent reinforcement learning algorithm with non-linear dynamics

Journal of Artificial Intelligence Research
Cooperation between multiple agents based on partially sharing policy

ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Inspired by the recent results in policy gradient learning in a general-sum game scenario, in the form of two algorithms, IGA and WoLF-IGA, we explore an alternative version of WoLF. We show that our new WoLF criterion (PDWoLF) is also accurate in 2 × 2 games, while being accurately computable even in more than 2-action games, unlike WoLF that relies on estimation. In particular, we show that this difference in accuracy in more than 2-action games translates to faster convergence (to Nash equilibrium policies in self-play) for PDWoLF in conjunction with the general Policy Hill Climbing algorithm. Interestingly, this expedience gets more pronounced with increasing learning rate ratio, for which we also delve into an explanation. We also show experimentally that learning faster with PDWoLF could also entail learning better policies earlier in self play. Finally we present the scalable version of PDWoLF and show that even in such domains requiring generalizations and approximations, PDWoLF could dominate WoLF in performance.