The dynamics of reinforcement learning in cooperative multiagent systems
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Multiagent learning using a variable learning rate
Artificial Intelligence
Friend-or-Foe Q-learning in General-Sum Games
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms
Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms
Run the GAMUT: A Comprehensive Approach to Evaluating Game-Theoretic Algorithms
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Hi-index | 0.00 |
A reinforcement learning algorithm for multi-agent systems based on variable Hurwicz's optimistic-pessimistic criterion is proposed. The formal proof of its convergence is given. Hurwicz's criterion allows to embed initial knowledge of how friendly the environment in which the agent is supposed to function will be. Thorough testing of the developed algorithm against well-known reinforcement learning algorithms has shown that in many cases its successful performance can be explained by its tendency to force the other agents to follow the policy which is more profitable for it. In addition the variability of Hurwicz's criterion allowed it to converge to best-response against opponents with stationary policies.