Technical Note: \cal Q-Learning
Machine Learning
The dynamics of reinforcement learning in cooperative multiagent systems
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Pricing in Agent Economies Using Multi-Agent Q-Learning
Autonomous Agents and Multi-Agent Systems
Convergence Problems of General-Sum Multiagent Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Nash q-learning for general-sum stochastic games
The Journal of Machine Learning Research
Efficiency and nash equilibria in a scrip system for P2P networks
EC '06 Proceedings of the 7th ACM conference on Electronic commerce
Prediction, Learning, and Games
Prediction, Learning, and Games
Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
Exploring selfish reinforcement learning in repeated games with stochastic rewards
Autonomous Agents and Multi-Agent Systems
Regret based dynamics: convergence in weakly acyclic games
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Rational and convergent learning in stochastic games
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
In large systems, it is important for agents to learn to act effectively, but sophisticated multi-agent learning algorithms generally do not scale. An alternative approach is to find restricted classes of games where simple, efficient algorithms converge. It is shown that stage learning efficiently converges to Nash equilibria in large anonymous games if best-reply dynamics converge. Two features are identified that improve convergence. First, rather than making learning more difficult, more agents are actually beneficial in many settings. Second, providing agents with statistical information about the behavior of others can significantly reduce the number of observations needed.