Multi-Agent Reinforcement Learning Algorithm with Variable Optimistic-Pessimistic Criterion

Authors:
Natalia Akchurina
Affiliations:
International Graduate School of Dynamic Intelligent Systems, University of Paderborn, Germany, email: anatalia@mail.uni-paderborn.de
Venue:
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Year:
2008

Citing 6
Cited 0

The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Multiagent learning using a variable learning rate

Artificial Intelligence
Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms

Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms
Run the GAMUT: A Comprehensive Approach to Evaluating Game-Theoretic Algorithms

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

A reinforcement learning algorithm for multi-agent systems based on variable Hurwicz's optimistic-pessimistic criterion is proposed. The formal proof of its convergence is given. Hurwicz's criterion allows to embed initial knowledge of how friendly the environment in which the agent is supposed to function will be. Thorough testing of the developed algorithm against well-known reinforcement learning algorithms has shown that in many cases its successful performance can be explained by its tendency to force the other agents to follow the policy which is more profitable for it. In addition the variability of Hurwicz's criterion allowed it to converge to best-response against opponents with stationary policies.