Exploring selfish reinforcement learning in repeated games with stochastic rewards

Authors:
Katja Verbeeck;Ann Nowé;Johan Parent;Karl Tuyls
Affiliations:
Computational Modeling Lab (COMO), Vrije Universiteit Brussel, Brussels, Belgium;Computational Modeling Lab (COMO), Vrije Universiteit Brussel, Brussels, Belgium;Computational Modeling Lab (COMO), Vrije Universiteit Brussel, Brussels, Belgium;Institute for Knowledge and Agent Technology (IKAT), University of Maastricht, Maastricht, The Netherlands
Venue:
Autonomous Agents and Multi-Agent Systems
Year:
2007

Citing 11
Cited 21

Learning automata: an introduction

Learning automata: an introduction
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Reinforcement learning of coordination in cooperative multi-agent systems

Eighteenth national conference on Artificial intelligence
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
Learning to Coordinate Efficiently: a model-based approach

Journal of Artificial Intelligence Research
Varieties of learning automata: an overview

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Fairness in multi-agent systems

The Knowledge Engineering Review
Artificial agents learning human fairness

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 2
Multi-agent Learning Dynamics: A Survey

CIA '07 Proceedings of the 11th international workshop on Cooperative Information Agents XI
Coordinated Exploration in Conflicting Multi-stage Games

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
Multiagent learning in large anonymous games

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Control plane algorithms targeting challenging autonomic properties in grey systems

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Design of semi-decentralized control laws for distributed-air-jet micromanipulators by reinforcement learning

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Analyzing the dynamics of stigmergetic interactions through pheromone games

Theoretical Computer Science
Convergence of independent adaptive learners

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Solving multi-stage games with hierarchical learning automata that bootstrap

ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
Networks of learning automata and limiting games

ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
Human-inspired computational fairness

Autonomous Agents and Multi-Agent Systems
Long-term fairness with bounded worst-case losses

Autonomous Agents and Multi-Agent Systems
Speeding up learning automata based multi agent systems using the concepts of stigmergy and entropy

Expert Systems with Applications: An International Journal
The world of independent learners is not markovian

International Journal of Knowledge-based and Intelligent Engineering Systems
Multiagent learning in large anonymous games

Journal of Artificial Intelligence Research
Reaching correlated equilibria through multi-agent learning

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Cognitive policy learner: biasing winning or losing strategies

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Learning automata-based approach to learn dialogue policies in large state space

International Journal of Intelligent Information and Database Systems
Incorporating fairness into agent interactions modeled as two-player normal-form games

PRICAI'12 Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence
Decentralized anti-coordination through multi-agent learning

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we introduce a new multi-agent reinforcement learning algorithm, called exploring selfish reinforcement learning (ESRL). ESRL allows agents to reach optimal solutions in repeated non-zero sum games with stochastic rewards, by using coordinated exploration. First, two ESRL algorithms for respectively common interest and conflicting interest games are presented. Both ESRL algorithms are based on the same idea, i.e. an agent explores by temporarily excluding some of the local actions from its private action space, to give the team of agents the opportunity to look for better solutions in a reduced joint action space. In a latter stage these two algorithms are transformed into one generic algorithm which does not assume that the type of the game is known in advance. ESRL is able to find the Pareto optimal solution in common interest games without communication. In conflicting interest games ESRL only needs limited communication to learn a fair periodical policy, resulting in a good overall policy. Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection. A real-life experiment, i.e. adaptive load-balancing of parallel applications is added.