Learning to Reach the Pareto Optimal Nash Equilibrium as a Team

Authors:
Katja Verbeeck;Ann Nowé;Tom Lenaerts;Johan Parent
Affiliations:
-;-;-;-
Venue:
AI '02 Proceedings of the 15th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Year:
2002

Citing 6
Cited 5

Learning automata: an introduction

Learning automata: an introduction
The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Social Agents Playing a Periodical Policy

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
On No-Regret Learning, Fictitious Play, and Nash Equilibrium

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning

Asymmetric multiagent reinforcement learning

Web Intelligence and Agent Systems
Adaptive load balancing of parallel applications with multi-agent reinforcement learning on heterogeneous systems

Scientific Programming - Distributed Computing and Applications
Reaching pareto-optimality in prisoner's dilemma using conditional joint action learning

Autonomous Agents and Multi-Agent Systems
An adaptive policy gradient in learning Nash equilibria

Neurocomputing
A momentum-based approach to learning nash equilibria

PRIMA'06 Proceedings of the 9th Pacific Rim international conference on Agent Computing and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Coordination is an important issue in multi-agent systems when agents want to maximize their revenue. Often coordination is achieved through communication, however communication has its price. We are interested in finding an approach where the communication between the agents is kept low, and a global optimal behavior can still be found.In this paper we report on an efficient approach that allows independent reinforcement learning agents to reach a Pareto optimal Nash equilibrium with limited communication. The communication happens at regular time steps and is basically a signal for the agents to start an exploration phase. During each exploration phase, some agents exclude their current best action so as to give the team the opportunityto look for a possibly better Nash equilibrium. This technique of reducing the action space by exclusions was only recently introduced for finding periodical policies in games of conflicting interests. Here, we explore this technique in repeated common interest games with deterministic or stochastic outcomes.