Multiagent learning using a variable learning rate
Artificial Intelligence
Convergent Gradient Ascent in General-Sum Games
ECML '02 Proceedings of the 13th European Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
The Role of Reactivity in Multiagent Learning
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Efficient learning of multi-step best response
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Efficient no-regret multiagent learning
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 1
Rational and convergent learning in stochastic games
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
In competition environment a satisfactory multi-agent learning algorithm should, at a minimum, have rationality and convergence. Exploiter-PHC (It is written as Exploiter here) could beat many fair opponents, but it is neither rational against stationary policy nor convergent in self-play, even it could be beaten possibly by some fair opponents in lower league. Now an improved algorithm named ExploiterWT (Exploiter With Testing) based on Exploiter is proposed. The basic idea of ExploiterWT is that an additional testing period is added to estimate the Nash Equilibrium policy. ExploiterWT could satisfy these properties mentioned above. It needn’t Nash Equilibrium as apriori knowledge like Exploiter when it begins to exploiting. Even ExploiterWT could avoid being beaten by some fair opponents in lower league. In this paper, at first the thoughts of this algorithm will be introduced, and then experiment results obtained in Game Pennies-Matching against other algorithms will be given.