Simple artificial neural networks that match probability and exploit and explore when confronting a multiarmed bandit

  • Authors:
  • Michael R. W. Dawson;Brian Dupuis;Marcia L. Spetch;Debbie M. Kelly

  • Affiliations:
  • Department of Psychology, University of Alberta, Edmonton, AB, Canada;Department of Psychology, University of Alberta, Edmonton, AB, Canada;Department of Psychology, University of Alberta, Edmonton, AB, Canada;Department of Psychology, University of Saskatchewan, Saskatoon, SK, Canada

  • Venue:
  • IEEE Transactions on Neural Networks
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The matching law (Herrnstein 1961) states that response rates become proportional to reinforcement rates; this is related to the empirical phenomenon called probability matching (Vulkan 2000). Here, we show that a simple artificial neural network generates responses consistent with probability matching. This behavior was then used to create an operant procedure for network learning. We use the multiarmed bandit (Gittins 1989), a classic problem of choice behavior, to illustrate that operant training balances exploiting the bandit arm expected to pay off most frequently with exploring other arms. Perceptrons provide a medium for relating results from neural networks, genetic algorithms, animal learning, contingency theory, reinforcement learning, and theories of choice.