The dynamics of reinforcement learning in cooperative multiagent systems
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Convergence of Gradient Dynamics with a Variable Learning Rate
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Implicit Negotiation in Repeated Games
ATAL '01 Revised Papers from the 8th International Workshop on Intelligent Agents VIII
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Efficient learning of multi-step best response
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Learning against multiple opponents
AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
RVσ(t): a unifying approach to performance and convergence in online multiagent learning
AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Rational and convergent learning in stochastic games
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Learning against opponents with bounded memory
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Leading ad hoc agents in joint action settings with multiple teammates
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Cooperating with a markovian ad hoc teammate
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Teaching and leading an ad hoc teammate: Collaboration without pre-coordination
Artificial Intelligence
Multiagent learning in the presence of memory-bounded agents
Autonomous Agents and Multi-Agent Systems
Hi-index | 0.00 |
The traditional agenda in Multiagent Learning (MAL) has been to develop learners that guarantee convergence to an equilibrium in self-play or that converge to playing the best response against an opponent using one of a fixed setof known targeted strategies. This paper introduces an algorithm called Learn or Exploit for Adversary Induced Markov Decision Process (LoE-AIM) that targets optimality against any learning opponent that can be treated as a memory bounded adversary. LoE-AIMmakes no prior assumptions about the opponent and is tailored to optimally exploit any adversary which induces a Markov decision process in the state space of joint histories. LoE-AIMeither explores and gathers new information about the opponent or converges to the best response to the partially learned opponent strategy in repeated play. We further extend LoE-AIMto account for online repeated interactions against the same adversary with plays against other adversaries interleaved in between. LoE-AIM-repeatedstores learned knowledge about an adversary, identifies the adversary in case of repeated interaction, and reuses the stored knowledge about the behavior of the adversary to enhance learning in the current epoch of play. LoE-AIM and LoE-AIM-repeated are fully implemented, with results demonstrating their superiority over other existing MAL algorithms.