An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function

Authors:
Hajime Kimura;Shigenobu Kobayashi
Affiliations:
-;-
Venue:
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Year:
1998

Citing 0
Cited 17

Learning Time Allocation Using Neural Networks

CG '00 Revised Papers from the Second International Conference on Computers and Games
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

The Journal of Machine Learning Research
Reinforcement learning for quasi-passive dynamic walking of an unstable biped robot

Robotics and Autonomous Systems
Reinforcement Learning State Estimator

Neural Computation
Reinforcement learning for a biped robot based on a CPG-actor-critic method

Neural Networks
Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot

International Journal of Robotics Research
Reinforcement Learning in Fine Time Discretization

ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
Learning CPG sensory feedback with policy gradient for biped locomotion for a full-body humanoid

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
Experiments with infinite-horizon, policy-gradient estimation

Journal of Artificial Intelligence Research
Real-time reinforcement learning by sequential Actor-Critics and experience replay

Neural Networks
Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning

Neural Computation
A cat-like robot real-time learning to run

ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
Swarm reinforcement learning method based on an actor-critic method

SEAL'10 Proceedings of the 8th international conference on Simulated evolution and learning
Evaluation of the improved penalty avoiding rational policy making algorithm in real world environment

ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part I
Introduction of fixed mode states into online profit sharing and its application to waist trajectory generation of biped robot

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
2013 Special Issue: Autonomous reinforcement learning with experience replay

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract