Online Multiagent Learning against Memory Bounded Adversaries

Authors:
Doran Chakraborty;Peter Stone
Affiliations:
Department of Computer Sciences, University of Texas, Austin, USA;Department of Computer Sciences, University of Texas, Austin, USA
Venue:
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Year:
2008

Citing 10
Cited 4

The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Convergence of Gradient Dynamics with a Variable Learning Rate

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Implicit Negotiation in Repeated Games

ATAL '01 Revised Papers from the 8th International Workshop on Intelligent Agents VIII
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Efficient learning of multi-step best response

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Learning against multiple opponents

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
RVσ(t): a unifying approach to performance and convergence in online multiagent learning

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Rational and convergent learning in stochastic games

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Learning against opponents with bounded memory

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Leading ad hoc agents in joint action settings with multiple teammates

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Cooperating with a markovian ad hoc teammate

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Teaching and leading an ad hoc teammate: Collaboration without pre-coordination

Artificial Intelligence
Multiagent learning in the presence of memory-bounded agents

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The traditional agenda in Multiagent Learning (MAL) has been to develop learners that guarantee convergence to an equilibrium in self-play or that converge to playing the best response against an opponent using one of a fixed setof known targeted strategies. This paper introduces an algorithm called Learn or Exploit for Adversary Induced Markov Decision Process (LoE-AIM) that targets optimality against any learning opponent that can be treated as a memory bounded adversary. LoE-AIMmakes no prior assumptions about the opponent and is tailored to optimally exploit any adversary which induces a Markov decision process in the state space of joint histories. LoE-AIMeither explores and gathers new information about the opponent or converges to the best response to the partially learned opponent strategy in repeated play. We further extend LoE-AIMto account for online repeated interactions against the same adversary with plays against other adversaries interleaved in between. LoE-AIM-repeatedstores learned knowledge about an adversary, identifies the adversary in case of repeated interaction, and reuses the stored knowledge about the behavior of the adversary to enhance learning in the current epoch of play. LoE-AIM and LoE-AIM-repeated are fully implemented, with results demonstrating their superiority over other existing MAL algorithms.