Exploiting based pre-testing in competition environment

Authors:
Li-ming Wang;Yang Bai
Affiliations:
Information Engineering School of Zhengzhou University, Zhengzhou, China;Information Engineering School of Zhengzhou University, Zhengzhou, China
Venue:
PRIMA'06 Proceedings of the 9th Pacific Rim international conference on Agent Computing and Multi-Agent Systems
Year:
2006

Citing 7
Cited 0

Multiagent learning using a variable learning rate

Artificial Intelligence
Convergent Gradient Ascent in General-Sum Games

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Nash Convergence of Gradient Dynamics in General-Sum Games

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
The Role of Reactivity in Multiagent Learning

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Efficient learning of multi-step best response

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Efficient no-regret multiagent learning

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 1
Rational and convergent learning in stochastic games

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

In competition environment a satisfactory multi-agent learning algorithm should, at a minimum, have rationality and convergence. Exploiter-PHC (It is written as Exploiter here) could beat many fair opponents, but it is neither rational against stationary policy nor convergent in self-play, even it could be beaten possibly by some fair opponents in lower league. Now an improved algorithm named ExploiterWT (Exploiter With Testing) based on Exploiter is proposed. The basic idea of ExploiterWT is that an additional testing period is added to estimate the Nash Equilibrium policy. ExploiterWT could satisfy these properties mentioned above. It needn’t Nash Equilibrium as apriori knowledge like Exploiter when it begins to exploiting. Even ExploiterWT could avoid being beaten by some fair opponents in lower league. In this paper, at first the thoughts of this algorithm will be introduced, and then experiment results obtained in Game Pennies-Matching against other algorithms will be given.