IEEE Transactions on Neural Networks
Reinforcement learning and adaptive dynamic programming for feedback control
IEEE Circuits and Systems Magazine
Constrained controller design for a class of nonlinear discrete-time uncertain systems
ACC'09 Proceedings of the 2009 conference on American Control Conference
Adaptive dynamic programming-based optimal control of unknown affine nonlinear discrete-time systems
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Adaptive dynamic programming: an introduction
IEEE Computational Intelligence Magazine
PH optimal control in the clarifying process of sugar cane juice based on DHP
ICIC'10 Proceedings of the 6th international conference on Advanced intelligent computing theories and applications: intelligent computing
Brief paper: Optimality and convergence of adaptive optimal control by reinforcement synthesis
Automatica (Journal of IFAC)
Optimal control for a class of unknown nonlinear systems via the iterative GDHP algorithm
ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part II
Automatica (Journal of IFAC)
Temperature control in water-gas shift reaction with adaptive dynamic programming
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II
Information Sciences: an International Journal
Automatica (Journal of IFAC)
The optimal control of discrete-time delay nonlinear system with dual heuristic dynamic programming
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I
Discrete-time inverse optimal neural control for synchronous generators
Engineering Applications of Artificial Intelligence
Neural inverse optimal control applied to type 1 diabetes mellitus patients
Analog Integrated Circuits and Signal Processing
Hi-index | 0.01 |
Convergence of the value-iteration-based heuristic dynamic programming (HDP) algorithm is proven in the case of general nonlinear systems. That is, it is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT) nonlinear optimal control. It is assumed that, at each iteration, the value and action update equations can be exactly solved. The following two standard neural networks (NN) are used: a critic NN is used to approximate the value function, whereas an action network is used to approximate the optimal control policy. It is stressed that this approach allows the implementation of HDP without knowing the internal dynamics of the system. The exact solution assumption holds for some classes of nonlinear systems and, specifically, in the specific case of the DT linear quadratic regulator (LQR), where the action is linear and the value quadratic in the states and NNs have zero approximation error. It is stressed that, for the LQR, HDP may be implemented without knowing the system A matrix by using two NNs. This fact is not generally appreciated in the folklore of HDP for the DT LQR, where only one critic NN is generally used.