Using expectation-maximization for reinforcement learning
Neural Computation
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A view of the EM algorithm that justifies incremental, sparse, and other variants
Learning in graphical models
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
On the convergence of optimistic policy iteration
The Journal of Machine Learning Research
Exploiting structure to efficiently solve large scale partially observable markov decision processes
Exploiting structure to efficiently solve large scale partially observable markov decision processes
Probabilistic inference for solving discrete and continuous state Markov Decision Processes
ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning to Control in Operational Space
International Journal of Robotics Research
An analysis of reinforcement learning with function approximation
Proceedings of the 25th international conference on Machine learning
Forward search value iteration for POMDPs
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Learning motor primitives for robotics
ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Solving POMDPs by searching in policy space
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Analyzing and escaping local optima in planning as inference for partially observable domains
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
The Journal of Machine Learning Research
Living Machines'13 Proceedings of the Second international conference on Biomimetic and Biohybrid Systems
Monte-Carlo expectation maximization for decentralized POMDPs
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Online expectation maximization for reinforcement learning in POMDPs
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizon cases. We describe a Stochastic Approximation EM algorithm for likelihood maximization that, in the tabular case, is equivalent to a non-bootstrapping optimistic policy iteration algorithm like Sarsa(1) that can be applied both in MDPs and POMDPs. On the theoretical side, by relating the proposed stochastic EM algorithm to the family of optimistic policy iteration algorithms, we provide new tools that permit the design and analysis of algorithms in that family. On the practical side, preliminary experiments on a POMDP problem demonstrated encouraging results.