Least-Squares Methods in Reinforcement Learning for Control

Authors:
Michail G. Lagoudakis;Ronald Parr;Michael L. Littman
Affiliations:
-;-;-
Venue:
SETN '02 Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence
Year:
2002

Citing 9
Cited 10

Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Distributed Value Functions

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Algorithm Selection using Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Policy Iteration for Factored MDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
An approach to fuzzy control of nonlinear systems: stability and design issues

IEEE Transactions on Fuzzy Systems

Integrating Guidance into Relational Reinforcement Learning

Machine Learning
Metacognition in computation: a selected research review

Artificial Intelligence
Kernel rewards regression: an information efficient batch policy iteration approach

AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Learning tetris using the noisy cross-entropy method

Neural Computation
Reinforcement learning for a biped robot based on a CPG-actor-critic method

Neural Networks
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning

Artificial Intelligence
Apply ant colony optimization to Tetris

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Hybrid least-squares algorithms for approximate policy evaluation

Machine Learning
Field review: Metacognition in computation: A selected research review

Artificial Intelligence
Automatic induction of bellman-error features for probabilistic planning

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Least-squares methods have been successfully used for prediction problems in the context of reinforcement learning, but little has been done in extending these methods to control problems. This paper presents an overview of our research efforts in using least-squares techniques for control. In our early attempts, we considered a direct extension of the Least-Squares Temporal Difference (LSTD) algorithm in the spirit of Q-learning. Later, an effort to remedy some limitations of this algorithm (approximation bias, poor sample utilization) led to the Least-Squares Policy Iteration (LSPI) algorithm, which is a form of model-free approximate policy iteration and makes efficient use of training samples collected in any arbitrary manner. The algorithms are demonstrated on a variety of learning domains, including algorithm selection, inverted pendulum balancing, bicycle balancing and riding, multiagent learning in factored domains, and, recently, on two-player zero-sum Markov games and the game of Tetris.