Reinforcement Learning in Continuous Time and Space

Authors:
Kenji Doya
Affiliations:
-
Venue:
Neural Computation
Year:
2000

Citing 13
Cited 75

A menu of designs for reinforcement learning over time

Neural networks for control
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Reinforcement learning for continuous stochastic control problems

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Adaptive choice of grid and time in reinforcement learning

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Reinforcement Learning Applied to Linear Quadratic Regulation

Advances in Neural Information Processing Systems 5, [NIPS Conference]
On Line Estimation of Optimal Control Sequences: HJB Estimators

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Stable Function Approximation in Dynamic Programming

Stable Function Approximation in Dynamic Programming
TD Model: Modeling the World at a Mixture of Time Scales

TD Model: Modeling the World at a Mixture of Time Scales
A convergent reinforcement learning algorithm in the continuous case based on a finite difference method

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Reinforcement learning: a survey

Journal of Artificial Intelligence Research

Metalearning and neuromodulation

Neural Networks - Computational models of neuromodulation
Dopamine: generalization and bonuses

Neural Networks - Computational models of neuromodulation
Control of exploitation-exploration meta-parameter in reinforcement learning

Neural Networks - Computational models of neuromodulation
Multiple model-based reinforcement learning

Neural Computation
TCS Learning Classifier System Controller on a Real Robot

PPSN VII Proceedings of the 7th International Conference on Parallel Problem Solving from Nature
Feedforward Neural Networks in Reinforcement Learning Applied to High-Dimensional Motor Control

ALT '02 Proceedings of the 13th International Conference on Algorithmic Learning Theory
Isotropic sequence order learning

Neural Computation
Meta-learning in reinforcement learning

Neural Networks
ε-mdps: learning in varying environments

The Journal of Machine Learning Research
Inter-module credit assignment in modular reinforcement learning

Neural Networks
Reinforcement learning with via-point representation

Neural Networks
Actor-Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Temporal Sequence Learning, Prediction, and Control: A Review of Different Models and Their Relation to Biological Mechanisms

Neural Computation
Robust Reinforcement Learning

Neural Computation
Spike-Timing-Dependent Hebbian Plasticity as Temporal Difference Learning

Neural Computation
Rapid decision threshold modulation by reward rate in a neural network

Neural Networks - 2006 Special issue: Neurobiology of decision making
Reinforcement Learning State Estimator

Neural Computation
Guiding exploration by pre-existing knowledge without modifying reward

Neural Networks
Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot

International Journal of Robotics Research
Anticipating Rewards in Continuous Time and Space: A Case Study in Developmental Robotics

Anticipatory Behavior in Adaptive Learning Systems
A Reinforcement Learning Technique with an Adaptive Action Generator for a Multi-robot System

SAB '08 Proceedings of the 10th international conference on Simulation of Adaptive Behavior: From Animals to Animats
Combining modalities with different latencies for optimal motor control

Journal of Cognitive Neuroscience
Probabilistic Inference for Fast Learning in Control

Recent Advances in Reinforcement Learning
A reinforcement learning based neural network architecture for obstacle avoidance in multi-fingered grasp synthesis

Neurocomputing
A spiking neural network model of an actor-critic learning agent

Neural Computation
Gaussian process dynamic programming

Neurocomputing
2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems

Neural Networks
Learning and generation of goal-directed arm reaching from scratch

Neural Networks
Responsive elastic computing

GMAC '09 Proceedings of the 6th international conference industry session on Grids meets autonomic computing
Multiscale Anticipatory Behavior by Hierarchical Reinforcement Learning

Anticipatory Behavior in Adaptive Learning Systems
Learning CPG sensory feedback with policy gradient for biped locomotion for a full-body humanoid

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Real-time reinforcement learning by sequential Actor-Critics and experience replay

Neural Networks
Reinforcement learning and adaptive dynamic programming for feedback control

IEEE Circuits and Systems Magazine
Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Reinforcement learning of multiple tasks using parametric bias

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Generalized policy iteration for continuous-time systems

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning

Neural Computation
Learning efficient policies for vision-based navigation

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Biomimetic approach to tacit learning based on compound control

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Model-based reinforcement learning: a computational model and an fMRI study

Neurocomputing
Review article: Synergizing reinforcement learning and game theory-A new direction for control

Applied Soft Computing
Improving search efficiency in the action space of an instance-based reinforcement learning technique for multi-robot systems

ECAL'07 Proceedings of the 9th European conference on Advances in artificial life
Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem

Automatica (Journal of IFAC)
Efficient vision-based navigation

Autonomous Robots
Inter-modality mapping in robot with recurrent neural network

Pattern Recognition Letters
An incremental probabilistic neural network for regression and reinforcement learning tasks

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part II
From conditioning of a non specific sensor to emotional regulation of behavior

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part II
ACE (Actor-Critic-Explorer) paradigm for reinforcement learning in basal ganglia: Highlighting the role of subthalamic and pallidal nuclei

Neurocomputing
Coaching to enhance the online behavior learning of a robotic agent

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
Extraction of reward-related feature space using correlation-based and reward-based learning methods

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
Continuous state/action reinforcement learning: A growing self-organizing map approach

Neurocomputing
Swarm reinforcement learning method based on an actor-critic method

SEAL'10 Proceedings of the 8th international conference on Simulated evolution and learning
Microassembly path planning using reinforcement learning for improving positioning accuracy of a 1 cm3 omni-directional mobile microrobot

Applied Intelligence
Instance-based reinforcement learning technique with a meta-learning mechanism for robust multi-robot systems

TAROS'11 Proceedings of the 12th Annual conference on Towards autonomous robotic systems
An information-theoretic analysis of return maximization in reinforcement learning

Neural Networks
Reward-weighted regression with sample reuse for direct policy search in reinforcement learning

Neural Computation
A computational model for the effect of dopamine on action selection during stroop test

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Reward function and initial values: better choices for accelerated goal-directed reinforcement learning

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Spatial representation and navigation in a bio-inspired robot

Biomimetic Neural Learning for Intelligent Robots
Mosaic for multiple-reward environments

Neural Computation
Reinforcement learning algorithm with CTRNN in continuous action space

ICONIP'06 Proceedings of the 13 international conference on Neural Information Processing - Volume Part I
Homeokinetic reinforcement learning

PSL'11 Proceedings of the First IAPR TC3 conference on Partially Supervised Learning
The eMOSAIC model for humanoid robot control

Neural Networks
A competitive strategy for function approximation in Q-learning

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Event-learning and robust policy heuristics

Cognitive Systems Research
Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics

Automatica (Journal of IFAC)
Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems

Automatica (Journal of IFAC)
A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems

Automatica (Journal of IFAC)
A novel adaptive tropism reward ADHDP method with robust property

BICS'13 Proceedings of the 6th international conference on Advances in Brain Inspired Cognitive Systems
Probabilistic model-based imitation learning

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Dynamic dual adjustment of daily budgets and bids in sponsored search auctions

Decision Support Systems
Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems

Automatica (Journal of IFAC)
Embodied imitation-enhanced reinforcement learning in multi-agent systems

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
On integral generalized policy iteration for continuous-time linear quadratic regulations

Automatica (Journal of IFAC)
METAL: A framework for mixture-of-experts task and attention learning

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology

Quantified Score

Hi-index	0.02

Visualization

Abstract

This article presents a reinforcement learning framework for continuous time dynamical systems without a priori discretization of time, state, and action. Based on the Hamilton-Jacobi-Bellman(HJB) equation for infinite-horizon, discounted reward problems, we derive algorithms for estimating value functions and improving policies with the use of function approximators. The processof value function estimation is formulated asthe minimization of a continuous-time form of the temporal difference (TD) error. Update methods based on backward Euler approximation and exponential eligibility traces are derived, and their correspondences with the conventional residual gradient, TD(0), and TD(lambda) algorithms are shown. For policy improvement, two methods—a continuous actor-critic method and a value-gradient-based greedy policy—are formulated. As a special case of the latter, a nonlinear feedback control law using the value gradient and the model of the input gain is derived. The advantage updating, a model-free algorithm derived previously, is also formulated in the HJBbased framework.The performance of the proposed algorithms is first tested in a nonlinear control task of swinging a pendulum up with limited torque. It is shown in the simulations that (1) the task is accomplished by the continuous actor-critic method in a number of trials several times fewer than by the conventional discrete actor-critic method; (2) among the continuous policy update methods, the value-gradient-based policy with a known or learned dynamic model performs several times better than the actor-critic method; and (3) a value function update using exponential eligibility traces is more efficient and stable than that based on Euler approximation. The algorithms are then tested in a higher-dimensional task: cartpole swing-up. This task is accomplished in several hundred trials using the value-gradient-based policy with a learned dynamic model.