Individual Q-Learning in Normal Form Games

Authors:
David S. Leslie;E. J. Collins
Affiliations:
-;-
Venue:
SIAM Journal on Control and Optimization
Year:
2005

Citing 0
Cited 3

Dynamic analysis of multiagent Q-learning with ε-greedy exploration

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
EA2: The Winning Strategy for the Inaugural Lemonade Stand Game Tournament

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Agent learning in autonomic manufacturing execution systems for enterprise networking

Computers and Industrial Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The single-agent multi-armed bandit problem can be solved by an agent that learns the values of each action using reinforcement learning. However, the multi-agent version of the problem, the iterated normal form game, presents a more complex challenge, since the rewards available to each agent depend on the strategies of the others. We consider the behavior of value-based learning agents in this situation, and show that such agents cannot generally play at a Nash equilibrium, although if smooth best responses are used, a Nash distribution can be reached. We introduce a particular value-based learning algorithm, which we call individual Q-learning, and use stochastic approximation to study the asymptotic behavior, showing that strategies will converge to Nash distribution almost surely in 2-player zero-sum games and 2-player partnership games. Player-dependent learning rates are then considered, and it is shown that this extension converges in some games for which many algorithms, including the basic algorithm initially considered, fail to converge.