Technical Note: \cal Q-Learning
Machine Learning
Multiagent learning using a variable learning rate
Artificial Intelligence
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Friend-or-Foe Q-learning in General-Sum Games
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Convergence Problems of General-Sum Multiagent Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Gambling in a rigged casino: The adversarial multi-armed bandit problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Multi-agent learning model with bargaining
Proceedings of the 38th conference on Winter simulation
Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Hi-index | 0.00 |
Learning in multi-agent settings has recently garnered much interest, the result of which has been the development of somewhat effective multi-agent learning (MAL) algorithms for repeated normal-form games. However, general-purpose MAL algorithms for richer environments, such as general-sum repeated stochastic (Markov) games (RSGs), are less advanced. Indeed, previously created MAL algorithms for RSGs are typically successful only when the behavior of associates meets specific game theoretic assumptions and when the game is of a particular class (such as zero-sum games). In this paper, we present a new algorithm, called Pepper, that can be used to extend MAL algorithms designed for repeated normal-form games to RSGs. We demonstrate that Pepper creates a family of new algorithms, each of whose asymptotic performance in RSGs is reminiscent of its asymptotic performance in related repeated normal-form games. We also show that some algorithms formed with Pepper outperform existing algorithms in an interesting RSG.