A three-network architecture for on-line learning and optimization based on adaptive dynamic programming

Authors:
Haibo He;Zhen Ni;Jian Fu
Affiliations:
Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881, USA;Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881, USA;School of Automation, Wuhan University of Technology, Wuhan, Hubei 430070, China
Venue:
Neurocomputing
Year:
2012

Citing 15
Cited 5

Backpropagation: basics and new developments

The handbook of brain theory and neural networks
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence)

Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence)
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)

Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it

Neural Networks
Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints

IEEE Transactions on Neural Networks
Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Adaptive dynamic programming for discrete-time systems with infinite horizon and Ɛ -error bound in the performance cost

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Adaptive Critic Designs for Discrete-Time Zero-Sum Games With Application to Control

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Comparison of Adaptive Critic-Based and Classical Wide-Area Controllers for Power Systems

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Issues on Stability of ADP Feedback Controllers for Dynamical Systems

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Adaptive critic designs

IEEE Transactions on Neural Networks
Online learning control by association and reinforcement

IEEE Transactions on Neural Networks
Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems With -Error Bound

IEEE Transactions on Neural Networks
Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming

IEEE Transactions on Neural Networks

Thalamic cooperation between the cerebellum and basal ganglia with a new tropism-based action-dependent heuristic dynamic programming method

Neurocomputing
Multi-objective optimal control for a class of unknown nonlinear systems based on finite-approximation-error ADP algorithm

Neurocomputing
A novel adaptive tropism reward ADHDP method with robust property

BICS'13 Proceedings of the 6th international conference on Advances in Brain Inspired Cognitive Systems
Full-range adaptive cruise control based on supervised adaptive dynamic programming

Neurocomputing
Reactive power control of grid-connected wind farm based on adaptive dynamic programming

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we propose a novel adaptive dynamic programming (ADP) architecture with three networks, an action network, a critic network, and a reference network, to develop internal goal-representation for online learning and optimization. Unlike the traditional ADP design normally with an action network and a critic network, our approach integrates the third network, a reference network, into the actor-critic design framework to automatically and adaptively build an internal reinforcement signal to facilitate learning and optimization overtime to accomplish goals. We present the detailed design architecture and its associated learning algorithm to explain how effective learning and optimization can be achieved in this new ADP architecture. Furthermore, we test the performance of our architecture both on the cart-pole balancing task and the triple-link inverted pendulum balancing task, which are the popular benchmarks in the community to demonstrate its learning and control performance over time.