Self-teaching adaptive dynamic programming for Gomoku

Authors:
Dongbin Zhao;Zhen Zhang;Yujie Dai
Affiliations:
State Key Laboratory of Intelligent Control and Management of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;State Key Laboratory of Intelligent Control and Management of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;State Key Laboratory of Intelligent Control and Management of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Venue:
Neurocomputing
Year:
2012

Citing 11
Cited 3

The History Heuristic and Alpha-Beta Search Enhancements in Practice

IEEE Transactions on Pattern Analysis and Machine Intelligence
Connectionist learning of expert preferences by comparison training

Advances in neural information processing systems 1
Practical Issues in Temporal Difference Learning

Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Learning to Play Chess Using Temporal Differences

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
A Neural Network that Learns to Play Five-in-a-Row

ANNES '95 Proceedings of the 2nd New Zealand Two-Stream International Conference on Artificial Neural Networks and Expert Systems
Reinforcement-learning agents with different temperature parameters explain the variety of human action-selection behavior in a Markov decision process task

Neurocomputing
Robust high performance reinforcement learning through weighted k-nearest neighbors

Neurocomputing
A neighboring optimal adaptive critic for missile guidance

Mathematical and Computer Modelling: An International Journal

The optimal control of discrete-time delay nonlinear system with dual heuristic dynamic programming

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I
Full-range adaptive cruise control based on supervised adaptive dynamic programming

Neurocomputing
Dual Heuristic dynamic Programming for nonlinear discrete-time uncertain systems with state delay

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper adaptive dynamic programming (ADP) is applied to learn to play Gomoku. The critic network is used to evaluate board situations. The basic idea is to penalize the last move taken by the loser and reward the last move selected by the winner at the end of a game. The results show that the presented program is able to improve its performance by playing against itself and has approached the candidate level of a commercial Gomoku program called 5-star Gomoku. We also examined the influence of two methods for generating games: self-teaching and learning through watching two experts playing against each other and presented the comparison results and reasons.