Input generalization in delayed reinforcement learning: an algorithm and performance comparisons

  • Authors:
  • David Chapman;Leslie Pack Kaelbling

  • Affiliations:
  • Teleos Research, Palo Alto, CA;Teleos Research, Palo Alto, CA

  • Venue:
  • IJCAI'91 Proceedings of the 12th international joint conference on Artificial intelligence - Volume 2
  • Year:
  • 1991

Quantified Score

Hi-index 0.00

Visualization

Abstract

Delayed reinforcement learning is an attractive framework for the unsupervised learning of action policies for autonomous agents. Some existing delayed reinforcement learning techniques have shown promise in simple domains. However, a number of hurdles must be passed before they are applicable to realistic problems. This paper describes one such difficulty, the input generalization problem (whereby the system must generalize to produce similar actions in similar situations) and an implemented solution, the G algorithm. This algorithm is based on recursive splitting of the state space based on statistical measures of differences in reinforcements received. Connectionist backpropagation has previously been used for input generalization in reinforcement learning. We compare the two techniques analytically and empirically. The G algorithm's sound statistical basis makes it easy to predict when it should and should not work, whereas the behavior of back-propagation is unpredictable. We found that a previous successful use of backpropagation can be explained by the linearity of the application domain. We found that in another domain, G reliably found the optimal policy, whereas none of a set of runs of backpropagation with many combinations of parameters did.