On Actor-Critic Algorithms

Authors:
Vijay R. Konda;John N. Tsitsiklis
Affiliations:
-;-
Venue:
SIAM Journal on Control and Optimization
Year:
2003

Citing 0
Cited 48

From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning

Discrete Event Dynamic Systems
A Geometric Approach to Multi-Criterion Reinforcement Learning

The Journal of Machine Learning Research
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

The Journal of Machine Learning Research
An analytic modelling approach for network routing algorithms that use "ant-like" mobile agents

Computer Networks: The International Journal of Computer and Telecommunications Networking
Fuzzy Policy Reinforcement Learning in Cooperative Multi-robot Systems

Journal of Intelligent and Robotic Systems
Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule

Neural Computation
Reinforcement learning for a biped robot based on a CPG-actor-critic method

Neural Networks
Dynamics based control with an application to area-sweeping problems

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot

International Journal of Robotics Research
Non-parametric policy gradients: a unified treatment of propositional and relational domains

Proceedings of the 25th international conference on Machine learning
Reinforcement Learning in Fine Time Discretization

ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
Finding Exploratory Rewards by Embodied Evolution and Constrained Reinforcement Learning in the Cyber Rodents

Neural Information Processing
Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Passive dynamic walker controller design employing an RLS-based natural actor-critic learning algorithm

Engineering Applications of Artificial Intelligence
2008 Special Issue: Finding intrinsic rewards by embodied evolution and constrained reinforcement learning

Neural Networks
Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes

Simulation
Basis Expansion in Natural Actor Critic Methods

Recent Advances in Reinforcement Learning
A New Learning Algorithm for Optimal Stopping

Discrete Event Dynamic Systems
Optimal parameter trajectory estimation in parameterized SDEs: An algorithmic procedure

ACM Transactions on Modeling and Computer Simulation (TOMACS)
A spiking neural network model of an actor-critic learning agent

Neural Computation
Direct Policy Search Reinforcement Learning for Robot Control

Proceedings of the 2005 conference on Artificial Intelligence Research and Development
Exploiting locality of interactions using a policy-gradient approach in multiagent learning

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Learning CPG sensory feedback with policy gradient for biped locomotion for a full-body humanoid

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Covariant policy search

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Natural actor-critic algorithms

Automatica (Journal of IFAC)
An analytic modelling approach for network routing algorithms that use "ant-like" mobile agents

Computer Networks: The International Journal of Computer and Telecommunications Networking
Real-time reinforcement learning by sequential Actor-Critics and experience replay

Neural Networks
Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning

Neural Computation
Direct heuristic dynamic programming for nonlinear tracking control with filtered tracking error

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A Convergent Online Single Time Scale Actor Critic Algorithm

The Journal of Machine Learning Research
A cat-like robot real-time learning to run

ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
Impedance learning for robotic contact tasks using natural actor-critic algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Error Bounds for Approximations from Projected Linear Equations

Mathematics of Operations Research
The Dynamics of Multi-Agent Reinforcement Learning

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Kalman temporal differences

Journal of Artificial Intelligence Research
Preference-based policy iteration: leveraging preference learning for reinforcement learning

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
An RLS-based natural actor-critic algorithm for locomotion of a two-linked robot arm

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Learning to use the spectrum in self-configuring heterogenous networks: a logit equilibrium approach

Proceedings of the 5th International ICST Conference on Performance Evaluation Methodologies and Tools
Actor-critic algorithms for hierarchical Markov decision processes

Automatica (Journal of IFAC)
Approximate stochastic annealing for online control of infinite horizon Markov decision processes

Automatica (Journal of IFAC)
A comparative study of reinforcement learning techniques on dialogue management

EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems

Automatica (Journal of IFAC)
Two-step gradient-based reinforcement learning for underwater robotics behavior learning

Robotics and Autonomous Systems
An Actor-Critic based controller for glucose regulation in type 1 diabetes

Computer Methods and Programs in Biomedicine
2013 Special Issue: Autonomous reinforcement learning with experience replay

Neural Networks
Dynamic policy programming

The Journal of Machine Learning Research
Learning via human feedback in continuous state and action spaces

Applied Intelligence
Policy oscillation is overshooting

Neural Networks

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this article, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction, based on information provided by the critic. We show that the features for the critic should ideally span a subspace prescribed by the choice of parameterization of the actor. We study actor-critic algorithms for Markov decision processes with Polish state and action spaces. We state and prove two results regarding their convergence.