Computational experiments with the RAVE heuristic

Authors:
David Tom;Martin Müller
Affiliations:
Department of Computing Science, University of Alberta, Edmonton, Canada;Department of Computing Science, University of Alberta, Edmonton, Canada
Venue:
CG'10 Proceedings of the 7th international conference on Computers and games
Year:
2010

Citing 8
Cited 0

Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
Amazons Discover Monte-Carlo

CG '08 Proceedings of the 6th international conference on Computers and Games
Simulation-based approach to general game playing

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Reinforcement learning and simulation-based search in computer go

Reinforcement learning and simulation-based search in computer go
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning
Evaluation function based monte-carlo LOA

ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
A study of UCT and its enhancements in an artificial game

ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
Creating an upper-confidence-tree program for havannah

ACG'09 Proceedings of the 12th international conference on Advances in Computer Games

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Monte-Carlo tree search algorithm Upper Confidence bounds applied to Trees (UCT) has become extremely popular in computer games research. The Rapid Action Value Estimation (RAVE) heuristic is a strong estimator that often improves the performance of UCT-based algorithms. However, there are situations where RAVE misleads the search whereas pure UCT search can find the correct solution. Two games, the simple abstract game Sum of Switches (SOS) and the game of Go, are used to study the behavior of the RAVE heuristic. In SOS, RAVE updates are manipulated to mimic game situations where RAVE misleads the search. Such false RAVE updates are used to create RAVE overestimates and underestimates. A study of the distributions of mean and RAVE values reveals great differences between Go and SOS. While the RAVE-max update rule is able to correct extreme cases of RAVE underestimation, it is not effective in closer to practical settings and in Go.