Neural networks letter: Reinforcement learning for discounted values often loses the goal in the application to animal learning

Authors:
Yoshiya Yamaguchi;Yutaka Sakai
Affiliations:
Graduate School of Brain Sciences, Tamagawa University, Tokyo, Japan;Graduate School of Brain Sciences, Tamagawa University, Tokyo, Japan and Tamagawa University Brain Science Institute, 6-1-1 Tamagawa-gakuen, Machida, Tokyo 194-8610, Japan
Venue:
Neural Networks
Year:
2012

Citing 5
Cited 0

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
On Average Versus Discounted Reward Temporal-Difference Learning

Machine Learning
Representation and timing in theories of the dopamine system

Neural Computation
Hyperbolically discounted temporal difference learning

Neural Computation
Internal-time temporal difference model for neural value-based decision making

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The impulsive preference of an animal for an immediate reward implies that it might subjectively discount the value of potential future outcomes. A theoretical framework to maximize the discounted subjective value has been established in the reinforcement learning theory. The framework has been successfully applied in engineering. However, this study identified a limitation when applied to animal behavior, where in some cases, there is no learning goal. Here a possible learning framework was proposed that is well-posed in any cases and that is consistent with the impulsive preference.