2009 Special Issue: Adaptive learning via selectionism and Bayesianism, Part II: The sequential case

Authors:
Jun Zhang
Affiliations:
Department of Psychology, University of Michigan, 530 Church Street, Ann Arbor 48109-1043, USA
Venue:
Neural Networks
Year:
2009

Citing 7
Cited 3

A hierarchical system of learning automata that can learn the globally optimal path

Information Sciences: an International Journal
Learning automata: an introduction

Learning automata: an introduction
The Convergence of TD(λ) for General λ

Machine Learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning
2009 Special Issue: Adaptive learning via selectionism and Bayesianism, Part I: Connection between the two

Neural Networks
Some studies in machine learning using the game of checkers

IBM Journal of Research and Development

2009 Special Issue: Brain pathways for cognitive-emotional decision making in the human animal

Neural Networks
2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it

Neural Networks
2009 Special Issue: Adaptive learning via selectionism and Bayesianism, Part I: Connection between the two

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Animals increase or decrease their future tendency of emitting an action based on whether performing such action has, in the past, resulted in positive or negative reinforcement. An analysis in the companion paper [Zhang, J. (2009). Adaptive learning via selectionism and Bayesianism. Part I: Connection between the two. Neural Networks, 22(3), 220-228] of such selectionist style of learning reveals a resemblance between its ensemble-level dynamics governing the change of action probability and Bayesian learning where evidence (in this case, reward) is distributively applied to all action alternatives. Here, this equivalence is further explored in solving the temporal credit-assignment problem during the learning of an action sequence (''operant chain''). Naturally emerging are the notion of secondary (conditioned) reinforcement predicting the average reward associated with a stimulus, and the notion of actor-critic architecture involving concurrent learning of both action probability and reward prediction. While both are consistent with solutions provided by contemporary reinforcement learning theory (Sutton & Barto, 1998) for optimizing sequential decision-making under stationary Markov environments, we investigate the effect of action learning on reward prediction when both are carried out concurrently in any on-line scheme.