Cooperation in stochastic games through communication

Authors:
Raghav Aras;Alain Dutech;François Charpillet
Affiliations:
Loria \' INRIA-Lorraine, France;Loria \' INRIA-Lorraine, France;Loria \' INRIA-Lorraine, France
Venue:
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Year:
2005

Citing 3
Cited 1

Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Convergence Problems of General-Sum Multiagent Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research

Partial local friendq multiagent learning: application to team automobile coordination problem

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The application of reinforcement learning principles to the search of equilibrium policies in stochastic games (SGs) has met with some success ([3], [4], [2]). The key insight of this approach is that each agent can learn his own ß-discounted reward equilibrium policy by keeping track of Q-values of all the agents including himself, and considering the Q-value matrix for each state as his payoff matrix. Each agent sees what actions other agents take, and what payoffs they receive. There is some evidence that in practice, agents that do not observe the actions and payoffs of other agents (hereby denoted as imperfectly observing agents), can still learn adversarial equilibrium (AE) policies in general-sum SGs ([1]) using naive Q-learning. Considering the Prisoners' Dilemma stage game (Table 1) as an abstraction of a SG, this implies that, even by ignoring other agents' play, agents still learn to play DD, which is the adversarial equilibrium joint action. The payoff received in DD can be thought of as each agent's security level.