The Role of Reactivity in Multiagent Learning

Authors:
Bikramjit Banerjee;Jing Peng
Affiliations:
Tulane University;Tulane University
Venue:
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Year:
2004

Citing 7
Cited 3

The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Multiagent learning using a variable learning rate

Artificial Intelligence
Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Gambling in a rigged casino: The adversarial multi-armed bandit problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Incorporating opponent models into adversary search

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Learning from induced changes in opponent (re)actions in multi-agent games

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Exploiting based pre-testing in competition environment

PRIMA'06 Proceedings of the 9th Pacific Rim international conference on Agent Computing and Multi-Agent Systems
An overview of cooperative and competitive multiagent learning

LAMAS'05 Proceedings of the First international conference on Learning and Adaption in Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we take a closer look at a recently proposed classification scheme for multiagent learning algorithms. Based on this scheme an exploitation mechanism (we call it the Exploiter) was developed that could beat various Policy Hill Climbers (PHC) and other fair opponents in some repeated matrix games. We show on the contrary that some fair opponents may actually beat the Exploiter in repeated games. This clearly indicates a deficiency in the original classification scheme which we address. Specifically, we introduce a new measure called Reactivity that measures how fast a learner can adapt to an unexpected hypothetical change in the opponentýs policy. We show that in some games, this new measure can approximately predict the performance of a player, and based on this measure we explain the behaviors of various algorithms in the Matching Pennies game, which was inexplicable by the original scheme. Finally we show that under certain restrictions, a player that consciously tries to avoid exploitation may be unable to do so.