Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Universal parameter optimisation in games based on SPSA
Machine Learning
Monte-Carlo simulation balancing
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Computational Intelligence: An Introduction
Computational Intelligence: An Introduction
CIMCA '08 Proceedings of the 2008 International Conference on Computational Intelligence for Modelling Control & Automation
A hybrid neural network and Minimax algorithm for zero-sum games
Proceedings of the 2009 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists
Some studies in machine learning using the game of checkers
IBM Journal of Research and Development
TAAI '10 Proceedings of the 2010 International Conference on Technologies and Applications of Artificial Intelligence
Bandit based monte-carlo planning
ECML'06 Proceedings of the 17th European conference on Machine Learning
The grand challenge of computer Go: Monte Carlo tree search and extensions
Communications of the ACM
Evaluation function based monte-carlo LOA
ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
ACG'05 Proceedings of the 11th international conference on Advances in Computer Games
Hi-index | 0.00 |
The Upper Confidence for Trees (UCT) algorithm has been shown to perform well in complex games, but samples undesirable areas of the search space when building its game tree. This paper explores the design and parametric considerations for augmenting the UCT algorithm with an Artificial Neural Network (NN) to dynamically prune the game tree created, thus limiting the game tree created. The expansion phase of UCT is augmented with a trained NN to create a novel UCT-NN variant that includes prior knowledge and strategy. This paper considers the game of Go-Moku for investigating the design and parametric considerations of UCT-NN. The parameters considered are the exploration and exploitation balancing C parameter, the NN training and structural design parameters and the various pruning schemes which could be used in UCT-NN. Parameter tuning techniques are provided for managing the parametric concerns in the proposed algorithm. Results of parameter experiments indicate that a single value of C = 1.41 is suitable for the games studied. Suitable values were found for the structural and training parameters of NN, which were required to test various pruning schemes. Of the various pruning schemes considered, an exponentially decaying scheme is found to be superior in the UCT-NN algorithm where a large amount of moves are initially pruned, but fewer moves on deeper ply.