Proceedings of the seventh international conference (1990) on Machine learning
Technical Note: \cal Q-Learning
Machine Learning
Learning in embedded systems
Exploration bonuses and dual control
Machine Learning
Planning and acting in partially observable stochastic domains
Artificial Intelligence
A tutorial on learning with Bayesian networks
Learning in graphical models
A multi-agent reinforcement learning method for a partially-observable competitive game
Proceedings of the fifth international conference on Autonomous agents
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Online Model Selection Based on the Variational Bayes
Neural Computation
Reinforcement Learning in Continuous Time and Space
Neural Computation
R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Model based Bayesian exploration
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Metalearning and neuromodulation
Neural Networks - Computational models of neuromodulation
Meta-learning in reinforcement learning
Neural Networks
A reinforcement learning model for supply chain ordering management: An application to the beer game
Decision Support Systems
Optimal contraction theorem for exploration-exploitation tradeoff in search and optimization
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
IEEE Transactions on Neural Networks
Adaptive ε-greedy exploration in reinforcement learning based on value differences
KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
An off-policy natural policy gradient method for a partial observable Markov decision process
ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
A match made by modafinil: Probability matching in choice decisions and spatial attention
Journal of Cognitive Neuroscience
Hi-index | 0.00 |
In reinforcement learning (RL), the duality between exploitation and exploration has long been an important issue. This paper presents a new method that controls the balance between exploitation and exploration. Our learning scheme is based on model-based RL, in which the Bayes inference with forgetting effect estimates the state-transition probability of the environment. The balance parameter, which corresponds to the randomness in action selection, is controlled based on variation of action results and perception of environmental change. When applied to maze tasks, our method successfully obtains good controls by adapting to environmental changes. Recently, Usher et al. [Science 283 (1999) 549] has suggested that noradrenergic neurons in the locus coeruleus may control the exploitation-exploration balance in a real brain and that the balance may correspond to the level of animal's selective attention. According to this scenario, we also discuss a possible implementation in the brain.