Control of exploitation-exploration meta-parameter in reinforcement learning

Authors:
Shin Ishii;Wako Yoshida;Junichiro Yoshimoto
Affiliations:
Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, Japan and CREST, Japan Science and Technology Corporation, Japan;Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, Japan and CREST, Japan Science and Technology Corporation, Japan;Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, Japan and CREST, Japan Science and Technology Corporation, Japan
Venue:
Neural Networks - Computational models of neuromodulation
Year:
2002

Citing 15
Cited 15

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Technical Note: \cal Q-Learning

Machine Learning
Learning in embedded systems

Learning in embedded systems
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Exploration bonuses and dual control

Machine Learning
Planning and acting in partially observable stochastic domains

Artificial Intelligence
A tutorial on learning with Bayesian networks

Learning in graphical models
A multi-agent reinforcement learning method for a partially-observable competitive game

Proceedings of the fifth international conference on Autonomous agents
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Online Model Selection Based on the Variational Bayes

Neural Computation
Reinforcement Learning in Continuous Time and Space

Neural Computation
R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Model based Bayesian exploration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Metalearning and neuromodulation

Neural Networks - Computational models of neuromodulation
Meta-learning in reinforcement learning

Neural Networks
Reliability of internal prediction/estimation and its application: I. adaptive action selection reflecting reliability of value function

Neural Networks
A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game

Machine Learning
Reinforcement learning for a biped robot based on a CPG-actor-critic method

Neural Networks
A reinforcement learning model for supply chain ordering management: An application to the beer game

Decision Support Systems
Reinforcement-learning agents with different temperature parameters explain the variety of human action-selection behavior in a Markov decision process task

Neurocomputing
Optimal contraction theorem for exploration-exploitation tradeoff in search and optimization

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Model-based reinforcement learning: a computational model and an fMRI study

Neurocomputing
Control of unknown nonlinear systems with efficient transient performance using concurrent exploitation and exploration

IEEE Transactions on Neural Networks
Adaptive ε-greedy exploration in reinforcement learning based on value differences

KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
ACE (Actor-Critic-Explorer) paradigm for reinforcement learning in basal ganglia: Highlighting the role of subthalamic and pallidal nuclei

Neurocomputing
An off-policy natural policy gradient method for a partial observable Markov decision process

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
An information-theoretic analysis of return maximization in reinforcement learning

Neural Networks
A match made by modafinil: Probability matching in choice decisions and spatial attention

Journal of Cognitive Neuroscience

Quantified Score

Hi-index	0.00

Visualization

Abstract

In reinforcement learning (RL), the duality between exploitation and exploration has long been an important issue. This paper presents a new method that controls the balance between exploitation and exploration. Our learning scheme is based on model-based RL, in which the Bayes inference with forgetting effect estimates the state-transition probability of the environment. The balance parameter, which corresponds to the randomness in action selection, is controlled based on variation of action results and perception of environmental change. When applied to maze tasks, our method successfully obtains good controls by adapting to environmental changes. Recently, Usher et al. [Science 283 (1999) 549] has suggested that noradrenergic neurons in the locus coeruleus may control the exploitation-exploration balance in a real brain and that the balance may correspond to the level of animal's selective attention. According to this scenario, we also discuss a possible implementation in the brain.