Decentralized Learning in Markov Games

Authors:
P. Vrancx;K. Verbeeck;A. Nowe
Affiliations:
Comput. Modeling Lab., Vrije Univ. Brussel, Brussels;-;-
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Year:
2008

Citing 0
Cited 8

Analyzing the dynamics of stigmergetic interactions through pheromone games

Theoretical Computer Science
Active learning of plans for safety and reachability goals with partial observability

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Learning multi-agent state space representations

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Taking turns in general sum Markov games

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Speeding up learning automata based multi agent systems using the concepts of stigmergy and entropy

Expert Systems with Applications: An International Journal
Decentralized learning in wireless sensor networks

ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents
Solving sparse delayed coordination problems in multi-agent reinforcement learning

ALA'11 Proceedings of the 11th international conference on Adaptive and Learning Agents
Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality

Automatica (Journal of IFAC)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning automata (LA) were recently shown to be valuable tools for designing multiagent reinforcement learning algorithms. One of the principal contributions of the LA theory is that a set of decentralized independent LA is able to control a finite Markov chain with unknown transition probabilities and rewards. In this paper, we propose to extend this algorithm to Markov games-a straightforward extension of single-agent Markov decision problems to distributed multiagent decision problems. We show that under the same ergodic assumptions of the original theorem, the extended algorithm will converge to a pure equilibrium point between agent policies.