Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Algorithm Selection using Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Policy Iteration for Factored MDPs
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
An approach to fuzzy control of nonlinear systems: stability and design issues
IEEE Transactions on Fuzzy Systems
Integrating Guidance into Relational Reinforcement Learning
Machine Learning
Metacognition in computation: a selected research review
Artificial Intelligence
Kernel rewards regression: an information efficient batch policy iteration approach
AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Learning tetris using the noisy cross-entropy method
Neural Computation
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
Artificial Intelligence
Apply ant colony optimization to Tetris
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Hybrid least-squares algorithms for approximate policy evaluation
Machine Learning
Field review: Metacognition in computation: A selected research review
Artificial Intelligence
Automatic induction of bellman-error features for probabilistic planning
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Least-squares methods have been successfully used for prediction problems in the context of reinforcement learning, but little has been done in extending these methods to control problems. This paper presents an overview of our research efforts in using least-squares techniques for control. In our early attempts, we considered a direct extension of the Least-Squares Temporal Difference (LSTD) algorithm in the spirit of Q-learning. Later, an effort to remedy some limitations of this algorithm (approximation bias, poor sample utilization) led to the Least-Squares Policy Iteration (LSPI) algorithm, which is a form of model-free approximate policy iteration and makes efficient use of training samples collected in any arbitrary manner. The algorithms are demonstrated on a variety of learning domains, including algorithm selection, inverted pendulum balancing, bicycle balancing and riding, multiagent learning in factored domains, and, recently, on two-player zero-sum Markov games and the game of Tetris.