Restricted gradient-descent algorithm for value-function approximation in reinforcement learning

Authors:
André da Motta Salles Barreto;Charles W. Anderson
Affiliations:
Programa de Engenharia Civil/COPPE, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil;Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA
Venue:
Artificial Intelligence
Year:
2008

Citing 48
Cited 9

Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
Radial basis functions for multivariable interpolation: a review

Algorithms for approximation
A resource-allocating network for function interpolation

Neural Computation
Technical Note: \cal Q-Learning

Machine Learning
Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Machine Learning
Genetic Reinforcement Learning for Neurocontrol Problems

Machine Learning - Special issue on genetic algorithms
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
TD(λ) Converges with Probability 1

Machine Learning
Growing cell structures—a self-organizing network for unsupervised and supervised learning

Neural Networks
An Upper Bound on the Loss from Approximate Optimal-Value Functions

Machine Learning
Robot shaping: developing autonomous agents through learning

Artificial Intelligence
Efficient reinforcement learning through symbiotic evolution

Machine Learning - Special issue on reinforcement learning
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Adaptive internal state space construction method for reinforcement learning of a real-world agent

Neural Networks - Special issue on organisation of computation in brain-like systems
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
On the Convergence of Temporal-Difference Learning with Linear Function Approximation

Machine Learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Evolution strategies –A comprehensive introduction

Natural Computing: an international journal
Intelligent Control for an Acrobot

Journal of Intelligent and Robotic Systems
Kernel-Based Reinforcement Learning

Machine Learning
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Continuous-Action Q-Learning

Machine Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Scaling Reinforcement Learning toward RoboCup Soccer

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Off-Policy Temporal Difference Learning with Function Approximation

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Practical Reinforcement Learning in Continuous Spaces

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Q-Learning with Hidden-Unit Restarting

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Least-Squares Methods in Reinforcement Learning for Control

SETN '02 Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence
Evolving Objects: A General Purpose Evolutionary Computation Library

Selected Papers from the 5th European Conference on Artificial Evolution
Efficient Reinforcement Learning Through Evolving Neural Network Topologies

GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Dynamic Programming

Dynamic Programming
Networks and the Best Approximation Property

Networks and the Best Approximation Property
Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning)

Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning)
Least-squares policy iteration

The Journal of Machine Learning Research
Cultural enhancement of neuroevolution

Cultural enhancement of neuroevolution
Solving factored MDPs with continuous and discrete variables

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Alternative evolutionary algorithms for evolving programs: evolution strategies and steady state GP

Proceedings of the 8th annual conference on Genetic and evolutionary computation
On characteristics of markov decision processes and reinforcement learning in large domains

On characteristics of markov decision processes and reinforcement learning in large domains
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
A reinforcement learning approach to job-shop scheduling

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Some studies in machine learning using the game of checkers

IBM Journal of Research and Development
Some studies in machine learning using the game of checkers. II: recent progress

IBM Journal of Research and Development
Efficient non-linear control through neuroevolution

ECML'06 Proceedings of the 17th European conference on Machine Learning

Fuzzy decision tree function approximation in reinforcement learning

International Journal of Artificial Intelligence and Soft Computing
Urban traffic signal learning control using fuzzy actor-critic methods

ICNC'09 Proceedings of the 5th international conference on Natural computation
On the characteristics of sequential decision problems and their impact on evolutionary computation and reinforcement learning

EA'09 Proceedings of the 9th international conference on Artificial evolution
Continuous state/action reinforcement learning: A growing self-organizing map approach

Neurocomputing
Approximate dynamic programming for an inventory problem: Empirical comparison

Computers and Industrial Engineering
Low complexity proto-value function learning from sensory observations with incremental slow feature analysis

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Extracting key gene regulatory dynamics for the direct control of mechanical systems

PPSN'12 Proceedings of the 12th international conference on Parallel Problem Solving from Nature - Volume Part I
A Tensor Factorization Approach to Generalization in Multi-agent Reinforcement Learning

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Adaptive function approximation in reinforcement learning with an interpolating growing neural gas

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work presents the restricted gradient-descent (RGD) algorithm, a training method for local radial-basis function networks specifically developed to be used in the context of reinforcement learning. The RGD algorithm can be seen as a way to extract relevant features from the state space to feed a linear model computing an approximation of the value function. Its basic idea is to restrict the way the standard gradient-descent algorithm changes the hidden units of the approximator, which results in conservative modifications that make the learning process less prone to divergence. The algorithm is also able to configure the topology of the network, an important characteristic in the context of reinforcement learning, where the changing policy may result in different requirements on the approximator structure. Computational experiments are presented showing that the RGD algorithm consistently generates better value-function approximations than the standard gradient-descent method, and that the latter is more susceptible to divergence. In the pole-balancing and Acrobot tasks, RGD combined with SARSA presents competitive results with other methods found in the literature, including evolutionary and recent reinforcement-learning algorithms.