Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
Natural gradient works efficiently in learning
Neural Computation
On the existence of fixed points for approximate value iteration and temporal-difference learning
Journal of Optimization Theory and Applications
Kernel-Based Reinforcement Learning
Machine Learning
Technical Update: Least-Squares Temporal Difference Learning
Machine Learning
Least Squares Policy Evaluation Algorithms with Linear Function Approximation
Discrete Event Dynamic Systems
An Analysis of Direct Reinforcement Learning in Non-Markovian Domains
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Reinforcement learning for POMDPs based on action values and stochastic optimization
Eighteenth national conference on Artificial intelligence
SIAM Journal on Control and Optimization
Least-squares policy iteration
The Journal of Machine Learning Research
Learning tetris using the noisy cross-entropy method
Neural Computation
Machine learning of motor skills for robotics
Machine learning of motor skills for robotics
Neurocomputing
An analysis of reinforcement learning with function approximation
Proceedings of the 25th international conference on Machine learning
Proceedings of the 25th international conference on Machine learning
State-Dependent Exploration for Policy Gradient Methods
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Natural actor-critic algorithms
Automatica (Journal of IFAC)
2010 Special Issue: Parameter-exploring policy gradients
Neural Networks
Algorithms for Reinforcement Learning
Algorithms for Reinforcement Learning
Reinforcement Learning and Dynamic Programming Using Function Approximators
Reinforcement Learning and Dynamic Programming Using Function Approximators
Hi-index | 0.00 |
A majority of approximate dynamic programming approaches to the reinforcement learning problem can be categorized into greedy value function methods and value-based policy gradient methods. The former approach, although fast, is well known to be susceptible to the policy oscillation phenomenon. We take a fresh view to this phenomenon by casting, within the context of non-optimistic policy iteration, a considerable subset of the former approach as a limiting special case of the latter. We explain the phenomenon in terms of this view and illustrate the underlying mechanism with artificial examples. We also use it to derive the constrained natural actor-critic algorithm that can interpolate between the aforementioned approaches. In addition, it has been suggested in the literature that the oscillation phenomenon might be subtly connected to the grossly suboptimal performance in the Tetris benchmark problem of all attempted approximate dynamic programming methods. Based on empirical findings, we offer a hypothesis that might explain the inferior performance levels and the associated policy degradation phenomenon, and which would partially support the suggested connection. Finally, we report scores in the Tetris problem that improve on existing dynamic programming based results by an order of magnitude.