A menu of designs for reinforcement learning over time
Neural networks for control
TD-Gammon, a self-teaching backgammon program, achieves master-level play
Neural Computation
Reinforcement learning for continuous stochastic control problems
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Adaptive choice of grid and time in reinforcement learning
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Reinforcement Learning Applied to Linear Quadratic Regulation
Advances in Neural Information Processing Systems 5, [NIPS Conference]
On Line Estimation of Optimal Control Sequences: HJB Estimators
Advances in Neural Information Processing Systems 5, [NIPS Conference]
Stable Function Approximation in Dynamic Programming
Stable Function Approximation in Dynamic Programming
TD Model: Modeling the World at a Mixture of Time Scales
TD Model: Modeling the World at a Mixture of Time Scales
IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Metalearning and neuromodulation
Neural Networks - Computational models of neuromodulation
Dopamine: generalization and bonuses
Neural Networks - Computational models of neuromodulation
Control of exploitation-exploration meta-parameter in reinforcement learning
Neural Networks - Computational models of neuromodulation
Multiple model-based reinforcement learning
Neural Computation
TCS Learning Classifier System Controller on a Real Robot
PPSN VII Proceedings of the 7th International Conference on Parallel Problem Solving from Nature
Feedforward Neural Networks in Reinforcement Learning Applied to High-Dimensional Motor Control
ALT '02 Proceedings of the 13th International Conference on Algorithmic Learning Theory
Isotropic sequence order learning
Neural Computation
Meta-learning in reinforcement learning
Neural Networks
ε-mdps: learning in varying environments
The Journal of Machine Learning Research
Inter-module credit assignment in modular reinforcement learning
Neural Networks
Reinforcement learning with via-point representation
Neural Networks
Actor-Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Neural Computation
Spike-Timing-Dependent Hebbian Plasticity as Temporal Difference Learning
Neural Computation
Rapid decision threshold modulation by reward rate in a neural network
Neural Networks - 2006 Special issue: Neurobiology of decision making
Reinforcement Learning State Estimator
Neural Computation
Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot
International Journal of Robotics Research
Anticipating Rewards in Continuous Time and Space: A Case Study in Developmental Robotics
Anticipatory Behavior in Adaptive Learning Systems
A Reinforcement Learning Technique with an Adaptive Action Generator for a Multi-robot System
SAB '08 Proceedings of the 10th international conference on Simulation of Adaptive Behavior: From Animals to Animats
Combining modalities with different latencies for optimal motor control
Journal of Cognitive Neuroscience
Probabilistic Inference for Fast Learning in Control
Recent Advances in Reinforcement Learning
A spiking neural network model of an actor-critic learning agent
Neural Computation
Gaussian process dynamic programming
Neurocomputing
GMAC '09 Proceedings of the 6th international conference industry session on Grids meets autonomic computing
Multiscale Anticipatory Behavior by Hierarchical Reinforcement Learning
Anticipatory Behavior in Adaptive Learning Systems
Learning CPG sensory feedback with policy gradient for biped locomotion for a full-body humanoid
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Reinforcement learning and adaptive dynamic programming for feedback control
IEEE Circuits and Systems Magazine
Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Reinforcement learning of multiple tasks using parametric bias
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Generalized policy iteration for continuous-time systems
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Learning efficient policies for vision-based navigation
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Biomimetic approach to tacit learning based on compound control
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
ECAL'07 Proceedings of the 9th European conference on Advances in artificial life
Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem
Automatica (Journal of IFAC)
Efficient vision-based navigation
Autonomous Robots
Inter-modality mapping in robot with recurrent neural network
Pattern Recognition Letters
An incremental probabilistic neural network for regression and reinforcement learning tasks
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part II
From conditioning of a non specific sensor to emotional regulation of behavior
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part II
Coaching to enhance the online behavior learning of a robotic agent
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
Extraction of reward-related feature space using correlation-based and reward-based learning methods
ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
Swarm reinforcement learning method based on an actor-critic method
SEAL'10 Proceedings of the 8th international conference on Simulated evolution and learning
TAROS'11 Proceedings of the 12th Annual conference on Towards autonomous robotic systems
A computational model for the effect of dopamine on action selection during stroop test
ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Spatial representation and navigation in a bio-inspired robot
Biomimetic Neural Learning for Intelligent Robots
Mosaic for multiple-reward environments
Neural Computation
Reinforcement learning algorithm with CTRNN in continuous action space
ICONIP'06 Proceedings of the 13 international conference on Neural Information Processing - Volume Part I
Homeokinetic reinforcement learning
PSL'11 Proceedings of the First IAPR TC3 conference on Partially Supervised Learning
The eMOSAIC model for humanoid robot control
Neural Networks
A competitive strategy for function approximation in Q-learning
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Event-learning and robust policy heuristics
Cognitive Systems Research
Automatica (Journal of IFAC)
Automatica (Journal of IFAC)
Automatica (Journal of IFAC)
A novel adaptive tropism reward ADHDP method with robust property
BICS'13 Proceedings of the 6th international conference on Advances in Brain Inspired Cognitive Systems
Probabilistic model-based imitation learning
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Dynamic dual adjustment of daily budgets and bids in sponsored search auctions
Decision Support Systems
Embodied imitation-enhanced reinforcement learning in multi-agent systems
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
On integral generalized policy iteration for continuous-time linear quadratic regulations
Automatica (Journal of IFAC)
METAL: A framework for mixture-of-experts task and attention learning
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
Hi-index | 0.02 |
This article presents a reinforcement learning framework for continuous time dynamical systems without a priori discretization of time, state, and action. Based on the Hamilton-Jacobi-Bellman(HJB) equation for infinite-horizon, discounted reward problems, we derive algorithms for estimating value functions and improving policies with the use of function approximators. The processof value function estimation is formulated asthe minimization of a continuous-time form of the temporal difference (TD) error. Update methods based on backward Euler approximation and exponential eligibility traces are derived, and their correspondences with the conventional residual gradient, TD(0), and TD(lambda) algorithms are shown. For policy improvement, two methods—a continuous actor-critic method and a value-gradient-based greedy policy—are formulated. As a special case of the latter, a nonlinear feedback control law using the value gradient and the model of the input gain is derived. The advantage updating, a model-free algorithm derived previously, is also formulated in the HJBbased framework.The performance of the proposed algorithms is first tested in a nonlinear control task of swinging a pendulum up with limited torque. It is shown in the simulations that (1) the task is accomplished by the continuous actor-critic method in a number of trials several times fewer than by the conventional discrete actor-critic method; (2) among the continuous policy update methods, the value-gradient-based policy with a known or learned dynamic model performs several times better than the actor-critic method; and (3) a value function update using exponential eligibility traces is more efficient and stable than that based on Euler approximation. The algorithms are then tested in a higher-dimensional task: cartpole swing-up. This task is accomplished in several hundred trials using the value-gradient-based policy with a learned dynamic model.