Hyperbolically discounted temporal difference learning

Authors:
William H. Alexander;Joshua W. Brown
Affiliations:
-;-
Venue:
Neural Computation
Year:
2010

Citing 4
Cited 1

On Average Versus Discounted Reward Temporal-Difference Learning

Machine Learning
Long-term reward prediction in TD models of the dopamine system

Neural Computation
Temporal Difference Model Reproduces Anticipatory Neural Activity

Neural Computation
Brief paper: Average cost temporal-difference learning

Automatica (Journal of IFAC)

Neural networks letter: Reinforcement learning for discounted values often loses the goal in the application to animal learning

Neural Networks

Quantified Score

Hi-index	0.01

Visualization

Abstract

Hyperbolic discounting of future outcomes is widely observed to underlie choice behavior in animals. Additionally, recent studies (Kobayashi & Schultz, 2008) have reported that hyperbolic discounting is observed even in neural systems underlying choice. However, the most prevalent models of temporal discounting, such as temporal difference learning, assume that future outcomes are discounted exponentially. Exponential discounting has been preferred largely because it can be expressed recursively, whereas hyperbolic discounting has heretofore been thought not to have a recursive definition. In this letter, we define a learning algorithm, hyperbolically discounted temporal difference (HDTD) learning, which constitutes a recursive formulation of the hyperbolic model.