Hybrid least-squares methods for reinforcement learning

Authors:
Hailin Li;Cihan H. Dagli
Affiliations:
Department of Engineering Management, Smart Engineering Systems Laboratory, University of Missouri-Rolla, Rolla, MO;Department of Engineering Management, Smart Engineering Systems Laboratory, University of Missouri-Rolla, Rolla, MO
Venue:
IEA/AIE'2003 Proceedings of the 16th international conference on Developments in applied artificial intelligence
Year:
2003

Citing 6
Cited 0

Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Algorithm Selection using Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Policy Iteration for Factored MDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Temporal credit assignment in reinforcement learning

Temporal credit assignment in reinforcement learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Model-free Least-Squares Policy Iteration (LSPI) method has been successfully used for control problems in the context of reinforcement learning. LSPI is a promising algorithm that uses linear approximator architecture to achieve policy optimization in the spirit of Q-learning. However it faces challenging issues in terms of the selection of basis functions and training sample. Inspired by orthogonal Least-Squares regression method for selecting the centers of RBF neural network, a new hybrid learning method for LSPI is proposed in this paper. The suggested method uses simulation as a tool to guide the "feature configuration" process. The results on the learning control of Cart-Pole system illustrate the effectiveness of the presented method.