Proceedings of the seventh international conference (1990) on Machine learning
Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Honte, a go-playing program using neural nets
Machines that learn to play games
Combining online and offline knowledge in UCT
Proceedings of the 24th international conference on Machine learning
On the role of tracking in stationary environments
Proceedings of the 24th international conference on Machine learning
Reinforcement learning of local shape in the game of go
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Temporal difference learning applied to a high-performance game-playing program
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Bandit based monte-carlo planning
ECML'06 Proceedings of the 17th European conference on Machine Learning
Indirect encoding of neural networks for scalable go
PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part I
Monte-Carlo tree search and rapid action value estimation in computer Go
Artificial Intelligence
Learning to win by reading manuals in a Monte-Carlo framework
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Empirical evaluation of ad hoc teamwork in the pursuit domain
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Towards more intelligent adaptive video game agents: a computational intelligence perspective
Proceedings of the 9th conference on Computing Frontiers
Non-linear Monte-Carlo search in civilization II
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Strong mitigation: nesting search for good policies within search for good reward
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Learning to win by reading manuals in a monte-carlo framework
Journal of Artificial Intelligence Research
Besting the quiz master: crowdsourcing incremental classification games
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Lifelong learning for acquiring the wisdom of the crowd
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
We present a reinforcement learning architecture, Dyna-2, that encompasses both sample-based learning and sample-based search, and that generalises across states during both learning and search. We apply Dyna-2 to high performance Computer Go. In this domain the most successful planning methods are based on sample-based search algorithms, such as UCT, in which states are treated individually, and the most successful learning methods are based on temporal-difference learning algorithms, such as Sarsa, in which linear function approximation is used. In both cases, an estimate of the value function is formed, but in the first case it is transient, computed and then discarded after each move, whereas in the second case it is more permanent, slowly accumulating over many moves and games. The idea of Dyna-2 is for the transient planning memory and the permanent learning memory to remain separate, but for both to be based on linear function approximation and both to be updated by Sarsa. To apply Dyna-2 to 9x9 Computer Go, we use a million binary features in the function approximator, based on templates matching small fragments of the board. Using only the transient memory, Dyna-2 performed at least as well as UCT. Using both memories combined, it significantly outperformed UCT. Our program based on Dyna-2 achieved a higher rating on the Computer Go Online Server than any handcrafted or traditional search based program.