Performance Evaluation and Policy Selection in Multiclass Networks
Discrete Event Dynamic Systems
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
On the Lock-in Probability of Stochastic Approximation
Combinatorics, Probability and Computing
Combining importance sampling and temporal difference control variates to simulate Markov Chains
ACM Transactions on Modeling and Computer Simulation (TOMACS)
A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL
Probability in the Engineering and Informational Sciences
The Journal of Machine Learning Research
Distributed Topology Control of Wireless Networks
WIOPT '05 Proceedings of the Third International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks
The Journal of Machine Learning Research
Distributed topology control of wireless networks
Wireless Networks
Simulation-Based Optimization Approach for Software Cost Model with Rejuvenation
ATC '08 Proceedings of the 5th international conference on Autonomic and Trusted Computing
Markov Decision Processes with Arbitrary Reward Processes
Recent Advances in Reinforcement Learning
A New Learning Algorithm for Optimal Stopping
Discrete Event Dynamic Systems
Brief paper: An adaptive optimization scheme with satisfactory transient performance
Automatica (Journal of IFAC)
On step sizes, stochastic shortest paths, and survival probabilities in reinforcement learning
Proceedings of the 40th Conference on Winter Simulation
Oja's algorithm for graph clustering and Markov spectral decomposition
Proceedings of the 3rd International Conference on Performance Evaluation Methodologies and Tools
Reinforcement Learning: A Tutorial Survey and Recent Advances
INFORMS Journal on Computing
Dynamic analysis of multiagent Q-learning with ε-greedy exploration
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Fast gradient-descent methods for temporal-difference learning with linear function approximation
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Markov Decision Processes with Arbitrary Reward Processes
Mathematics of Operations Research
Natural actor-critic algorithms
Automatica (Journal of IFAC)
From Q(λ) to average Q-learning: efficient implementation of an asymptotic approximation
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Wireless NUM: rate and reliability tradeoffs in random environments
WCNC'09 Proceedings of the 2009 IEEE conference on Wireless Communications & Networking Conference
Adaptive bases for reinforcement learning
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
IEEE Transactions on Information Theory
An information-spectrum approach to analysis of return maximization in reinforcement learning
ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
On stochastic gradient and subgradient methods with adaptive steplength sequences
Automatica (Journal of IFAC)
Optimal multi-layered congestion based pricing schemes for enhanced QoS
Computer Networks: The International Journal of Computer and Telecommunications Networking
Charge-based control of DiffServ-like queues
Automatica (Journal of IFAC)
Stochastic approximation with long range dependent and heavy tailed noise
Queueing Systems: Theory and Applications
Oja's algorithm for graph clustering, Markov spectral decomposition, and risk sensitive control
Automatica (Journal of IFAC)
Smart exploration in reinforcement learning using absolute temporal difference errors
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Hi-index | 0.07 |
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated ODE. This in turn implies convergence of the algorithm. Several specific classes of algorithms are considered as applications. It is found that the results provide (i) a simpler derivation of known results for reinforcement learning algorithms; (ii) a proof for the first time that a class of asynchronous stochastic approximation algorithms are convergent without using any a priori assumption of stability; (iii) a proof for the first time that asynchronous adaptive critic and Q-learning algorithms are convergent for the average cost optimal control problem.