Actor-Critic algorithm based on incremental least-squares temporal difference with eligibility trace

Authors:
Yuhu Cheng;Huanting Feng;Xuesong Wang
Affiliations:
School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, P.R.China;School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, P.R.China;School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, P.R.China
Venue:
ICIC'11 Proceedings of the 7th international conference on Advanced Intelligent Computing Theories and Applications: with aspects of artificial intelligence
Year:
2011

Citing 6
Cited 0

Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Natural Actor-Critic

Neurocomputing
Natural actor-critic algorithms

Automatica (Journal of IFAC)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Compared with value-function-based reinforcement learning (RL) methods, policy gradient reinforcement learning methods have better convergence, but large variance of policy gradient estimation influences the learning performance. In order to improve the convergence speed of policy gradient RL methods and the precision of gradient estimation, a kind of Actor-Critic (AC) learning algorithm based on incremental least-squares temporal difference with eligibility trace (iLSTD(λ)) is proposed by making use of the characteristics of AC framework, function approximator and iLSTD(λ) algorithm. The Critic estimates the value-function according to the iLSTD(λ) algorithm, and the Actor updates the policy parameter based on a regular gradient. Simulation results concerning a grid world with 10×10 size illustrate that the AC algorithm based on iLSTD(λ) not only has quick convergence speed but also has good gradient estimation.