From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning
Discrete Event Dynamic Systems
A Geometric Approach to Multi-Criterion Reinforcement Learning
The Journal of Machine Learning Research
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
The Journal of Machine Learning Research
An analytic modelling approach for network routing algorithms that use "ant-like" mobile agents
Computer Networks: The International Journal of Computer and Telecommunications Networking
Fuzzy Policy Reinforcement Learning in Cooperative Multi-robot Systems
Journal of Intelligent and Robotic Systems
Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule
Neural Computation
Dynamics based control with an application to area-sweeping problems
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot
International Journal of Robotics Research
Non-parametric policy gradients: a unified treatment of propositional and relational domains
Proceedings of the 25th international conference on Machine learning
Reinforcement Learning in Fine Time Discretization
ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
Neural Information Processing
Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Engineering Applications of Artificial Intelligence
Basis Expansion in Natural Actor Critic Methods
Recent Advances in Reinforcement Learning
A New Learning Algorithm for Optimal Stopping
Discrete Event Dynamic Systems
Optimal parameter trajectory estimation in parameterized SDEs: An algorithmic procedure
ACM Transactions on Modeling and Computer Simulation (TOMACS)
A spiking neural network model of an actor-critic learning agent
Neural Computation
Direct Policy Search Reinforcement Learning for Robot Control
Proceedings of the 2005 conference on Artificial Intelligence Research and Development
Exploiting locality of interactions using a policy-gradient approach in multiagent learning
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Learning CPG sensory feedback with policy gradient for biped locomotion for a full-body humanoid
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Natural actor-critic algorithms
Automatica (Journal of IFAC)
An analytic modelling approach for network routing algorithms that use "ant-like" mobile agents
Computer Networks: The International Journal of Computer and Telecommunications Networking
Direct heuristic dynamic programming for nonlinear tracking control with filtered tracking error
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A Convergent Online Single Time Scale Actor Critic Algorithm
The Journal of Machine Learning Research
A cat-like robot real-time learning to run
ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
Impedance learning for robotic contact tasks using natural actor-critic algorithm
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Error Bounds for Approximations from Projected Linear Equations
Mathematics of Operations Research
The Dynamics of Multi-Agent Reinforcement Learning
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Journal of Artificial Intelligence Research
Preference-based policy iteration: leveraging preference learning for reinforcement learning
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
An RLS-based natural actor-critic algorithm for locomotion of a two-linked robot arm
CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Learning to use the spectrum in self-configuring heterogenous networks: a logit equilibrium approach
Proceedings of the 5th International ICST Conference on Performance Evaluation Methodologies and Tools
Actor-critic algorithms for hierarchical Markov decision processes
Automatica (Journal of IFAC)
Approximate stochastic annealing for online control of infinite horizon Markov decision processes
Automatica (Journal of IFAC)
A comparative study of reinforcement learning techniques on dialogue management
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Automatica (Journal of IFAC)
Two-step gradient-based reinforcement learning for underwater robotics behavior learning
Robotics and Autonomous Systems
An Actor-Critic based controller for glucose regulation in type 1 diabetes
Computer Methods and Programs in Biomedicine
The Journal of Machine Learning Research
Learning via human feedback in continuous state and action spaces
Applied Intelligence
Policy oscillation is overshooting
Neural Networks
Hi-index | 0.01 |
In this article, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction, based on information provided by the critic. We show that the features for the critic should ideally span a subspace prescribed by the choice of parameterization of the actor. We study actor-critic algorithms for Markov decision processes with Polish state and action spaces. We state and prove two results regarding their convergence.