On Average Versus Discounted Reward Temporal-Difference Learning
Machine Learning
Representation and timing in theories of the dopamine system
Neural Computation
Hyperbolically discounted temporal difference learning
Neural Computation
Hi-index | 0.00 |
The impulsive preference of an animal for an immediate reward implies that it might subjectively discount the value of potential future outcomes. A theoretical framework to maximize the discounted subjective value has been established in the reinforcement learning theory. The framework has been successfully applied in engineering. However, this study identified a limitation when applied to animal behavior, where in some cases, there is no learning goal. Here a possible learning framework was proposed that is well-posed in any cases and that is consistent with the impulsive preference.