A hybrid evolving and gradient strategy for approximating policy evaluation on online critic-actor learning

Authors:
Jian Fu;Haibo He;Huiying Li;Qing Liu
Affiliations:
School of Automation, Wuhan University of Technology, Wuhan, Hubei, China;Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, Kingston, RI;School of Automation, Wuhan University of Technology, Wuhan, Hubei, China;School of Automation, Wuhan University of Technology, Wuhan, Hubei, China
Venue:
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
Year:
2012

Citing 9
Cited 0

Genetic Reinforcement Learning for Neurocontrol Problems

Machine Learning - Special issue on genetic algorithms
Efficient reinforcement learning through symbiotic evolution

Machine Learning - Special issue on reinforcement learning
Differential Evolution Training Algorithm for Feed-Forward Neural Networks

Neural Processing Letters
Evolutionary adaptive-critic methods for reinforcement learning

CEC '02 Proceedings of the Evolutionary Computation on 2002. CEC '02. Proceedings of the 2002 Congress - Volume 02
JADE: adaptive differential evolution with optional external archive

IEEE Transactions on Evolutionary Computation
Evolving neural networks: a comparison between differential evolution and particle swarm optimization

ICSI'11 Proceedings of the Second international conference on Advances in swarm intelligence - Volume Part I
Differential Evolution: A Survey of the State-of-the-Art

IEEE Transactions on Evolutionary Computation
A new evolutionary system for evolving artificial neural networks

IEEE Transactions on Neural Networks
Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a novel strategy for approximating policy evaluation during online critic-actor learning procedure. We adopt the adaptive differential evolution with elites (ADEE) to optimize moving least square temporal difference with one step (MLSTD(0)) at the early stage which is good at global searching. Next we apply gradient method to perform local search efficiently and effectively. That solves the dilemma between explore and exploit in weight seeking for critic neural network. Simulation results on the online learning control of a cart pole benchmark demonstrate the efficiency of the presented method.