Exploiting multiple secondary reinforcers in policy gradient reinforcement learning

Authors:
Greg Grudic;Lyle Ungar
Affiliations:
IRCS and GRASP Lab, University of Pennsylvania, Philadelphia, PA;Computer and Information Science, University of Pennsylvania, Philadelphia, PA
Venue:
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Year:
2001

Citing 11
Cited 0

Numerical recipes in C: the art of scientific computing

Numerical recipes in C: the art of scientific computing
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Steps toward artificial intelligence

Computers & thought
Gradient descent for general reinforcement learning

Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Localizing Policy Gradient Estimates to Action Transition

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning to Cooperate via Policy Search

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Localizing Search in Reinforcement Learning

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Reinforcement Learning in POMDP's via Direct Gradient Ascent

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Reinforcement learning: a survey

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most formulations of Reinforcement Learning depend on a single reinforcement reward value to guide the search for the optimal policy solution. If observation of this reward is rare or expensive, converging to a solution can be impractically slow. One way to exploit additional domain knowledge is to use more readily available, but related quantities as secondary reinforcers to guide the search through the space of all policies. We propose a method to augment Policy Gradient Reinforcement Learning algorithms by using prior domain knowledge to estimate desired relative levels of a set of secondary reinforcement quantities. RL can then be applied to determine a policy which will establish these levels. The primary reinforcement reward is then sampled to calculate a gradient for each secondary reinforcer, in the direction of increased primary reward. These gradients are used to improve the estimate of relative secondary values, and the process iterates until reward is maximized. We prove that the algorithm converges to a local optimum in secondary reward space, and that the rate of convergence of the performance gradient estimate in secondary reward space is independent of the size of the state space. Experimental results demonstrate that the algorithm can converge many orders of magnitude faster than standard policy gradient formulations.