Exploiting based pre-testing in competition environment

  • Authors:
  • Li-ming Wang;Yang Bai

  • Affiliations:
  • Information Engineering School of Zhengzhou University, Zhengzhou, China;Information Engineering School of Zhengzhou University, Zhengzhou, China

  • Venue:
  • PRIMA'06 Proceedings of the 9th Pacific Rim international conference on Agent Computing and Multi-Agent Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In competition environment a satisfactory multi-agent learning algorithm should, at a minimum, have rationality and convergence. Exploiter-PHC (It is written as Exploiter here) could beat many fair opponents, but it is neither rational against stationary policy nor convergent in self-play, even it could be beaten possibly by some fair opponents in lower league. Now an improved algorithm named ExploiterWT (Exploiter With Testing) based on Exploiter is proposed. The basic idea of ExploiterWT is that an additional testing period is added to estimate the Nash Equilibrium policy. ExploiterWT could satisfy these properties mentioned above. It needn’t Nash Equilibrium as apriori knowledge like Exploiter when it begins to exploiting. Even ExploiterWT could avoid being beaten by some fair opponents in lower league. In this paper, at first the thoughts of this algorithm will be introduced, and then experiment results obtained in Game Pennies-Matching against other algorithms will be given.