Reinforcement learning with n-tuples on the game connect-4

  • Authors:
  • Markus Thill;Patrick Koch;Wolfgang Konen

  • Affiliations:
  • Department of Computer Science, Cologne University of Applied Sciences, Gummersbach, Germany;Department of Computer Science, Cologne University of Applied Sciences, Gummersbach, Germany;Department of Computer Science, Cologne University of Applied Sciences, Gummersbach, Germany

  • Venue:
  • PPSN'12 Proceedings of the 12th international conference on Parallel Problem Solving from Nature - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Learning complex game functions is still a difficult task. We apply temporal difference learning (TDL), a well-known variant of the reinforcement learning approach, in combination with n-tuple networks to the game Connect-4. Our agent is trained just by self-play. It is able, for the first time, to consistently beat the optimal-playing Minimax agent (in game situations where a win is possible). The n-tuple network induces a mighty feature space: It is not necessary to design certain features, but the agent learns to select the right ones. We believe that the n-tuple network is an important ingredient for the overall success and identify several aspects that are relevant for achieving high-quality results. The architecture is sufficiently general to be applied to similar reinforcement learning tasks as well.