Reinforcement-learning agents with different temperature parameters explain the variety of human action-selection behavior in a Markov decision process task

Authors:
Fumihiko Ishida;Takahiro Sasaki;Yutaka Sakaguchi;Hiroyuki Shimai
Affiliations:
Graduate School of Information Systems, University of Electro-Communications, 1-5-1 Chofu-ga-oka, Chofu, Tokyo 182-8585, Japan;Graduate School of Information Systems, University of Electro-Communications, 1-5-1 Chofu-ga-oka, Chofu, Tokyo 182-8585, Japan;Graduate School of Information Systems, University of Electro-Communications, 1-5-1 Chofu-ga-oka, Chofu, Tokyo 182-8585, Japan;Graduate School of Information Systems, University of Electro-Communications, 1-5-1 Chofu-ga-oka, Chofu, Tokyo 182-8585, Japan
Venue:
Neurocomputing
Year:
2009

Citing 8
Cited 1

A model of the smooth pursuit eye movement system

Biological Cybernetics
Technical Note: \cal Q-Learning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Metalearning and neuromodulation

Neural Networks - Computational models of neuromodulation
Control of exploitation-exploration meta-parameter in reinforcement learning

Neural Networks - Computational models of neuromodulation
Reliability of internal prediction/estimation and its application: I. adaptive action selection reflecting reliability of value function

Neural Networks
Bayesian representation learning in the cortex regulated by acetylcholine

Neural Networks
Model-based reinforcement learning: a computational model and an fMRI study

Neurocomputing

Self-teaching adaptive dynamic programming for Gomoku

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

We investigated the characteristics of the human action-selection in performing a Markov decision process (MDP) task, and compared them to those of reinforcement-learning (RL) agents. The behavior of human participants was roughly classified into two qualitatively different types. On the other hand, surprisingly, the variety of human behavior could be explained simply by a single parameter of the degree of randomness (i.e., the temperature parameter) in the action-selection rules of the RL agents. This result implies that the various behaviors of human action-selection may be determined by a simple mechanism in the brain.