Efficient reinforcement learning using recursive least-squares methods

Authors:
Xin Xu;Han-gen He;Dewen Hu
Affiliations:
Department of Automatic Control, National University of Defense Technology, Hunan, P.R.China;Department of Automatic Control, National University of Defense Technology, Hunan, P.R.China;Department of Automatic Control, National University of Defense Technology, Hunan, P.R.China
Venue:
Journal of Artificial Intelligence Research
Year:
2002

Citing 17
Cited 17

Recursive estimation and time-series analysis: an introduction

Recursive estimation and time-series analysis: an introduction
Technical Note: \cal Q-Learning

Machine Learning
Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Machine Learning
The Convergence of TD(λ) for General λ

Machine Learning
TD(λ) Converges with Probability 1

Machine Learning
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Adaptive filter theory (3rd ed.)

Adaptive filter theory (3rd ed.)
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Study of the transient phase of the forgetting factor RLS

IEEE Transactions on Signal Processing

Applying brain emotional learning algorithm for multivariable control of HVAC systems

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation

Proceedings of the 24th international conference on Machine learning
Preconditioned temporal difference learning

Proceedings of the 25th international conference on Machine learning
Passive dynamic walker controller design employing an RLS-based natural actor-critic learning algorithm

Engineering Applications of Artificial Intelligence
Reordering Sparsification of Kernel Machines in Approximate Policy Iteration

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
Incremental least-squares temporal difference learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Reinforcement learning algorithms based on mGA and EA with policy iterations

LSMS'07 Proceedings of the Life system modeling and simulation 2007 international conference on Bio-Inspired computational intelligence and applications
Impedance learning for robotic contact tasks using natural actor-critic algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning

ICNC'06 Proceedings of the Second international conference on Advances in Natural Computation - Volume Part I
An RLS-based natural actor-critic algorithm for locomotion of a two-linked robot arm

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
A reinforcement learning approach for host-based intrusion detection using sequences of system calls

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
A novel feature sparsification method for kernel-based approximate policy iteration

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
A rapid sparsification method for kernel machines in approximate policy iteration

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
Adaptive reservoir computing through evolution and learning

Neurocomputing
A hierarchical representation policy iteration algorithm for reinforcement learning

IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
An efficient L2-norm regularized least-squares temporal difference learning algorithm

Knowledge-Based Systems
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recursive least-squares (RLS) algorithm is one of the most well-known algorithms used in adaptive filtering, system identification and adaptive control. Its popularity is mainly due to its fast convergence speed, which is considered to be optimal in practice. In this paper, RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed. The two algorithms are called RLS-TD(λ) and Fast-AHC (Fast Adaptive Heuristic Critic), respectively. RLS-TD(λ) can be viewed as the extension of RLS-TD(0) from λ =0 to general 0≤ λ ≤1, so it is a multi-step temporal-difference (TD) learning algorithm using RLS methods. The convergence with probability one and the limit of convergence of RLS-TD(λ) are proved for ergodic Markov chains. Compared to the existing LS-TD(λ) algorithm, RLS-TD(λ) has advantages in computation and is more suitable for online learning. The effectiveness of RLS-TD(λ) is analyzed and verified by learning prediction experiments of Markov chains with a wide range of parameter settings. The Fast-AHC algorithm is derived by applying the proposed RLS-TD(λ) algorithm in the critic network of the adaptive heuristic critic method. Unlike conventional AHC algorithm, Fast-AHC makes use of RLS methods to improve the learning-prediction efficiency in the critic. Learning control experiments of the cart-pole balancing and the acrobot swing-up problems are conducted to compare the data efficiency of Fast-AHC with conventional AHC. From the experimental results, it is shown that the data efficiency of learning control can also be improved by using RLS methods in the learning-prediction process of the critic. The performance of Fast-AHC is also compared with that of the AHC method using LS-TD(λ). Furthermore, it is demonstrated in the experiments that different initial values of the variance matrix in RLS-TD(λ) are required to get better performance not only in learning prediction but also in learning control. The experimental results are analyzed based on the existing theoretical work on the transient phase of forgetting factor RLS methods.