Hierarchical mixtures of experts and the EM algorithm
Neural Computation
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Scalable Internal-State Policy-Gradient Methods for POMDPs
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Reinforcement learning with selective perception and hidden state
Reinforcement learning with selective perception and hidden state
Utile distinction hidden Markov models
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Model-free reinforcement learning as mixture learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Multi-task Reinforcement Learning in Partially Observable Stochastic Environments
The Journal of Machine Learning Research
Planning and acting in partially observable stochastic domains
Artificial Intelligence
A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems
Neural Processing Letters
Learning finite-state controllers for partially observable environments
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
We present online nested expectation maximization for model-free reinforcement learning in a POMDP. The algorithm evaluates the policy only in the current learning episode, discarding the episode after the evaluation and memorizing the sufficient statistic, from which the policy is computed in closed-form. As a result, the online algorithm has a time complexity O(n) and a memory complexity O(1), compared to O(n2) and O(n) for the corresponding batch-mode algorithm, where n is the number of learning episodes. The online algorithm, which has a provable convergence, is demonstrated on five benchmark POMDP problems.