Cooling schedules for optimal annealing
Mathematics of Operations Research
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Introduction to Stochastic Search and Optimization
Introduction to Stochastic Search and Optimization
SIAM Journal on Control and Optimization
An Adaptive Sampling Algorithm for Solving Markov Decision Processes
Operations Research
Brief paper: New algorithms of the Q-learning type
Automatica (Journal of IFAC)
Reinforcement Learning: A Tutorial Survey and Recent Advances
INFORMS Journal on Computing
Finite time analysis of the pursuit algorithm for learning automata
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hi-index | 22.14 |
We present an online simulation-based algorithm called Approximate Stochastic Annealing (ASA) for solving infinite-horizon finite state-action space Markov decision processes (MDPs). The algorithm estimates the optimal policy by sampling at each iteration from a probability distribution function over the policy space, which is updated iteratively based on the Q-function estimates obtained via a recursion of Q-learning type. By exploiting a novel connection of ASA to the stochastic approximation method, we show that the sequence of distribution functions generated by the algorithm converges to a degenerated distribution that concentrates only on the optimal policy. Numerical examples are also provided to illustrate the algorithm.