Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
Asynchronous Stochastic Approximations
SIAM Journal on Control and Optimization
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Reinforcement Learning
Parallel and Distributed Computation: Numerical Methods
Parallel and Distributed Computation: Numerical Methods
Neuro-Dynamic Programming
Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms
SIAM Journal on Control and Optimization
Technical Update: Least-Squares Temporal Difference Learning
Machine Learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Off-Policy Temporal Difference Learning with Function Approximation
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning
Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning
On the convergence of optimistic policy iteration
The Journal of Machine Learning Research
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning
Discrete Event Dynamic Systems
Simulation-based Algorithms for Markov Decision Processes (Communications and Control Engineering)
Simulation-based Algorithms for Markov Decision Processes (Communications and Control Engineering)
Stochastic Learning and Optimization: A Sensitivity-Based Approach (International Series on Discrete Event Dynamic Systems)
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
Brief paper: New algorithms of the Q-learning type
Automatica (Journal of IFAC)
On the convergence of stochastic iterative dynamic programming algorithms
Neural Computation
Dynamic Programming and Optimal Control, Vol. II
Dynamic Programming and Optimal Control, Vol. II
Projected equation methods for approximate solution of large linear systems
Journal of Computational and Applied Mathematics
Fast gradient-descent methods for temporal-difference learning with linear function approximation
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Control Techniques for Complex Networks
Control Techniques for Complex Networks
On Regression-Based Stopping Times
Discrete Event Dynamic Systems
Reinforcement Learning and Dynamic Programming Using Function Approximators
Reinforcement Learning and Dynamic Programming Using Function Approximators
Hi-index | 0.00 |
We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal state costs or Q-factors. The main difference is in the policy evaluation phase: instead of solving a linear system of equations, our algorithm requires solving an optimal stopping problem. The solution of this problem may be inexact, with a finite number of value iterations, in the spirit of modified policy iteration. The stopping problem structure is incorporated into the standard Q-learning algorithm to obtain a new method that is intermediate between policy iteration and Q-learning/value iteration. Thanks to its special contraction properties, our method overcomes some of the traditional convergence difficulties of modified policy iteration and admits asynchronous deterministic and stochastic iterative implementations, with lower overhead and/or more reliable convergence over existing Q-learning schemes. Furthermore, for large-scale problems, where linear basis function approximations and simulation-based temporal difference implementations are used, our algorithm addresses effectively the inherent difficulties of approximate policy iteration due to inadequate exploration of the state and control spaces.