Using expectation-maximization for reinforcement learning
Neural Computation
Marginal maximum a posteriori estimation using Markov chain Monte Carlo
Statistics and Computing
Mean Shift, Mode Seeking, and Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
PEGASUS: A policy search method for large MDPs and POMDPs
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Introduction to Stochastic Search and Optimization
Introduction to Stochastic Search and Optimization
Probabilistic inference for solving discrete and continuous state Markov Decision Processes
ICML '06 Proceedings of the 23rd international conference on Machine learning
Planning and Moving in Dynamic Environments
Creating Brain-Like Intelligence
Analyzing and escaping local optima in planning as inference for partially observable domains
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Hi-index | 0.00 |
In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higher-dimensional spaces. We first introduce a new target distribution which is able to incorporate more reward information from sampled trajectories. We also show how to break strong correlations between the policy parameters and sampled trajectories in order to sample more freely. Finally, we show how to incorporate these techniques in a principled manner to obtain estimates of the optimal policy.