Reinforcement learning with n-tuples on the game connect-4

Authors:
Markus Thill;Patrick Koch;Wolfgang Konen
Affiliations:
Department of Computer Science, Cologne University of Applied Sciences, Gummersbach, Germany;Department of Computer Science, Cologne University of Applied Sciences, Gummersbach, Germany;Department of Computer Science, Cologne University of Applied Sciences, Gummersbach, Germany
Venue:
PPSN'12 Proceedings of the 12th international conference on Parallel Problem Solving from Nature - Volume Part I
Year:
2012

Citing 9
Cited 0

TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Neural Connect 4 - A Connectionist Approach to the Game

SBRN '02 Proceedings of the VII Brazilian Symposium on Neural Networks (SBRN'02)
Temporal credit assignment in reinforcement learning

Temporal credit assignment in reinforcement learning
Reinforcement Learning: Insights from Interesting Failures in Parameter Selection

Proceedings of the 10th international conference on Parallel Problem Solving from Nature: PPSN X
Pattern recognition and reading by machine

IRE-AIEE-ACM '59 (Eastern) Papers presented at the December 1-3, 1959, eastern joint IRE-AIEE-ACM computer conference
Reinforcement learning for games: failures and successes

Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Some studies in machine learning using the game of checkers

IBM Journal of Research and Development
Learning n-tuple networks for othello by coevolutionary gradient search

Proceedings of the 13th annual conference on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning complex game functions is still a difficult task. We apply temporal difference learning (TDL), a well-known variant of the reinforcement learning approach, in combination with n-tuple networks to the game Connect-4. Our agent is trained just by self-play. It is able, for the first time, to consistently beat the optimal-playing Minimax agent (in game situations where a win is possible). The n-tuple network induces a mighty feature space: It is not necessary to design certain features, but the agent learns to select the right ones. We believe that the n-tuple network is an important ingredient for the overall success and identify several aspects that are relevant for achieving high-quality results. The architecture is sufficiently general to be applied to similar reinforcement learning tasks as well.