Reactivity and Safe Learning in Multi-Agent Systems

Authors:
Bikramjit Banerjee;Jing Peng
Affiliations:
Department of Electrical Engineering and Computer Science, Tulane University, New Orleans, LA 70118, USA;Department of Electrical Engineering and Computer Science, Tulane University, New Orleans, LA 70118, USA
Venue:
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Year:
2006

Citing 9
Cited 0

The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Multiagent learning using a variable learning rate

Artificial Intelligence
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Gambling in a rigged casino: The adversarial multi-armed bandit problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Product Distribution Theory for Control of Multi-Agent Systems

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Performance bounded reinforcement learning in strategic interactions

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Asimovian adaptive agents

Journal of Artificial Intelligence Research
Fast concurrent reinforcement learners

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-agent reinforcement learning (MRL) is a growing area of research. What makes it particularly challenging is that multiple learners render each other's environments non-stationary. In addition to adapting their behaviors to other learning agents, online learners must also provide assurances about their online performance in order to promote user trust of adaptive agent systems deployed in real world applications. In this article, instead of developing new algorithms with such assurances, we study the question of safety in online performance of some existing MRL algorithms. We identify the key notion of reactivity of a learner by analyzing how an algorithm (PHC-Exploiter), designed to exploit some simpler opponents, can itself be exploited by them. We quantify and analyze this concept of reac tivity in the context of these algorithms to explain their experimental behaviors. We argue that no learner can be designed that can deliberately avoid exploitation. We also show that any attempt to opti mize reactivity must take into account a tradeoff with sensitivity to noise, and devise an adaptive method (based on environmental feedback) designed to maximize the learner's safety and minimize its sensitivity to noise.