The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

Authors:
V. S. Borkar;S. P. Meyn
Affiliations:
-;-
Venue:
SIAM Journal on Control and Optimization
Year:
2000

Citing 0
Cited 35

Performance Evaluation and Policy Selection in Multiclass Networks

Discrete Event Dynamic Systems
Learning Rates for Q-Learning

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
PAC Bounds for Multi-armed Bandit and Markov Decision Processes

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
On the Lock-in Probability of Stochastic Approximation

Combinatorics, Probability and Computing
Combining importance sampling and temporal difference control variates to simulate Markov Chains

ACM Transactions on Modeling and Computer Simulation (TOMACS)
A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL

Probability in the Engineering and Informational Sciences
Learning Rates for Q-learning

The Journal of Machine Learning Research
Distributed Topology Control of Wireless Networks

WIOPT '05 Proceedings of the Third International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks
Online calibrated forecasts: Memory efficiency versus universality for learning in games

Machine Learning
Adaptive Importance Sampling Technique for Markov Chains Using Stochastic Approximation

Operations Research
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

The Journal of Machine Learning Research
Distributed topology control of wireless networks

Wireless Networks
Simulation-Based Optimization Approach for Software Cost Model with Rejuvenation

ATC '08 Proceedings of the 5th international conference on Autonomic and Trusted Computing
Markov Decision Processes with Arbitrary Reward Processes

Recent Advances in Reinforcement Learning
A New Learning Algorithm for Optimal Stopping

Discrete Event Dynamic Systems
Brief paper: An adaptive optimization scheme with satisfactory transient performance

Automatica (Journal of IFAC)
On step sizes, stochastic shortest paths, and survival probabilities in reinforcement learning

Proceedings of the 40th Conference on Winter Simulation
Oja's algorithm for graph clustering and Markov spectral decomposition

Proceedings of the 3rd International Conference on Performance Evaluation Methodologies and Tools
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
Dynamic analysis of multiagent Q-learning with ε-greedy exploration

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Fast gradient-descent methods for temporal-difference learning with linear function approximation

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Markov Decision Processes with Arbitrary Reward Processes

Mathematics of Operations Research
Natural actor-critic algorithms

Automatica (Journal of IFAC)
From Q(λ) to average Q-learning: efficient implementation of an asymptotic approximation

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Wireless NUM: rate and reliability tradeoffs in random environments

WCNC'09 Proceedings of the 2009 IEEE conference on Wireless Communications & Networking Conference
Adaptive bases for reinforcement learning

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Analyticity, convergence, and convergence rate of recursive maximum-likelihood estimation in hidden Markov models

IEEE Transactions on Information Theory
An information-spectrum approach to analysis of return maximization in reinforcement learning

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
An information-theoretic analysis of return maximization in reinforcement learning

Neural Networks
On stochastic gradient and subgradient methods with adaptive steplength sequences

Automatica (Journal of IFAC)
Optimal multi-layered congestion based pricing schemes for enhanced QoS

Computer Networks: The International Journal of Computer and Telecommunications Networking
Charge-based control of DiffServ-like queues

Automatica (Journal of IFAC)
Stochastic approximation with long range dependent and heavy tailed noise

Queueing Systems: Theory and Applications
Oja's algorithm for graph clustering, Markov spectral decomposition, and risk sensitive control

Automatica (Journal of IFAC)
Smart exploration in reinforcement learning using absolute temporal difference errors

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Quantified Score

Hi-index	0.07

Visualization

Abstract

It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated ODE. This in turn implies convergence of the algorithm. Several specific classes of algorithms are considered as applications. It is found that the results provide (i) a simpler derivation of known results for reinforcement learning algorithms; (ii) a proof for the first time that a class of asynchronous stochastic approximation algorithms are convergent without using any a priori assumption of stability; (iii) a proof for the first time that asynchronous adaptive critic and Q-learning algorithms are convergent for the average cost optimal control problem.