Internal-time temporal difference model for neural value-based decision making

Authors:
Hiroyuki Nakahara;Sivaramakrishnan Kaveri
Affiliations:
-;-
Venue:
Neural Computation
Year:
2010

Citing 10
Cited 1

Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
On Average Versus Discounted Reward Temporal-Difference Learning

Machine Learning
Temporal Difference Model Reproduces Anticipatory Neural Activity

Neural Computation
Parallel Cortico-Basal Ganglia Mechanisms for Acquisition and Execution of Visuomotor Sequences - A Computational Approach

Journal of Cognitive Neuroscience
Extended LATER model can account for trial-by-trial variability of both pre- and post-processes

Neural Networks - 2006 Special issue: Neurobiology of decision making
Neural systems implicated in delayed and probabilistic reinforcement

Neural Networks - 2006 Special issue: Neurobiology of decision making
Stimulus representation and the timing of reward-prediction errors in models of the dopamine system

Neural Computation
A spiking neural network model of an actor-critic learning agent

Neural Computation

Neural networks letter: Reinforcement learning for discounted values often loses the goal in the application to animal learning

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

The temporal difference (TD) learning framework is a major paradigm for understanding value-based decision making and related neural activities (e.g., dopamine activity). The representation of time in neural processes modeled by a TD framework, however, is poorly understood. To address this issue, we propose a TD formulation that separates the time of the operator (neural valuation processes), which we refer to as internal time, from the time of the observer (experiment), which we refer to as conventional time. We provide the formulation and theoretical characteristics of this TD model based on internal time, called internal-time TD, and explore the possible consequences of the use of this model in neural value-based decision making. Due to the separation of the two times, internal-time TD computations, such as TD error, are expressed differently, depending on both the time frame and time unit. We examine this operator-observer problem in relation to the time representation used in previous TD models. An internal time TD value function exhibits the co-appearance of exponential and hyperbolic discounting at different delays in intertemporal choice tasks. We further examine the effects of internal time noise on TD error, the dynamic construction of internal time, and the modulation of internal time with the internal time hypothesis of serotonin function. We also relate the internal TD formulation to research on interval timing and subjective time.