Actor-Critic--Type Learning Algorithms for Markov Decision Processes

Authors:
Vijaymohan R. Konda;Vivek S. Borkar
Affiliations:
-;-
Venue:
SIAM Journal on Control and Optimization
Year:
1999

Citing 0
Cited 19

From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning

Discrete Event Dynamic Systems
Learning Time Allocation Using Neural Networks

CG '00 Revised Papers from the Second International Conference on Computers and Games
A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis

Machine Learning
CONVERGENCE OF SIMULATION-BASED POLICY ITERATION

Probability in the Engineering and Informational Sciences
A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL

Probability in the Engineering and Informational Sciences
Adaptive Importance Sampling Technique for Markov Chains Using Stochastic Approximation

Operations Research
Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation

The Journal of Machine Learning Research
STEWARD: demo of spatio-textual extraction on the web aiding the retrieval of documents

dg.o '07 Proceedings of the 8th annual international conference on Digital government research: bridging disciplines & domains
Brief paper: New algorithms of the Q-learning type

Automatica (Journal of IFAC)
Simulation-Based Optimization Approach for Software Cost Model with Rejuvenation

ATC '08 Proceedings of the 5th international conference on Autonomic and Trusted Computing
Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes

Simulation
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
Geometric variance reduction in Markov chains: application to value function and gradient estimation

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Natural actor-critic algorithms

Automatica (Journal of IFAC)
A Convergent Online Single Time Scale Actor Critic Algorithm

The Journal of Machine Learning Research
Learning to use the spectrum in self-configuring heterogenous networks: a logit equilibrium approach

Proceedings of the 5th International ICST Conference on Performance Evaluation Methodologies and Tools
Book reviews: Self-learning control of finite Markov chains

Automatica (Journal of IFAC)
Actor-critic algorithms for hierarchical Markov decision processes

Automatica (Journal of IFAC)
An actor-critic algorithm for multi-agent learning in queue-based stochastic games

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known "actor-critic" (or "adaptive critic") algorithm in the artificial intelligence literature. Distributed asynchronous implementations are considered. The analysis involves two time scale stochastic approximations.