The dynamics of reinforcement learning in cooperative multiagent systems
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Multiagent learning using a variable learning rate
Artificial Intelligence
Friend-or-Foe Q-learning in General-Sum Games
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Gambling in a rigged casino: The adversarial multi-armed bandit problem
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Incorporating opponent models into adversary search
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Learning from induced changes in opponent (re)actions in multi-agent games
AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Exploiting based pre-testing in competition environment
PRIMA'06 Proceedings of the 9th Pacific Rim international conference on Agent Computing and Multi-Agent Systems
An overview of cooperative and competitive multiagent learning
LAMAS'05 Proceedings of the First international conference on Learning and Adaption in Multi-Agent Systems
Hi-index | 0.00 |
In this paper we take a closer look at a recently proposed classification scheme for multiagent learning algorithms. Based on this scheme an exploitation mechanism (we call it the Exploiter) was developed that could beat various Policy Hill Climbers (PHC) and other fair opponents in some repeated matrix games. We show on the contrary that some fair opponents may actually beat the Exploiter in repeated games. This clearly indicates a deficiency in the original classification scheme which we address. Specifically, we introduce a new measure called Reactivity that measures how fast a learner can adapt to an unexpected hypothetical change in the opponentýs policy. We show that in some games, this new measure can approximately predict the performance of a player, and based on this measure we explain the behaviors of various algorithms in the Matching Pennies game, which was inexplicable by the original scheme. Finally we show that under certain restrictions, a player that consciously tries to avoid exploitation may be unable to do so.