Model-free reinforcement learning as mixture learning

Authors:
Nikos Vlassis;Marc Toussaint
Affiliations:
Technical University of Crete, Chania, Greece;TU Berlin, Berlin, Germany
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 15
Cited 5

Using expectation-maximization for reinforcement learning

Neural Computation
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
On the convergence of optimistic policy iteration

The Journal of Machine Learning Research
Exploiting structure to efficiently solve large scale partially observable markov decision processes

Exploiting structure to efficiently solve large scale partially observable markov decision processes
Probabilistic inference for solving discrete and continuous state Markov Decision Processes

ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning to Control in Operational Space

International Journal of Robotics Research
An analysis of reinforcement learning with function approximation

Proceedings of the 25th international conference on Machine learning
Forward search value iteration for POMDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Learning motor primitives for robotics

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Solving POMDPs by searching in policy space

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Analyzing and escaping local optima in planning as inference for partially observable domains

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Dynamic policy programming

The Journal of Machine Learning Research
Learning epistemic actions in model-free memory-free reinforcement learning: experiments with a neuro-robotic model

Living Machines'13 Proceedings of the Second international conference on Biomimetic and Biohybrid Systems
Monte-Carlo expectation maximization for decentralized POMDPs

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Online expectation maximization for reinforcement learning in POMDPs

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizon cases. We describe a Stochastic Approximation EM algorithm for likelihood maximization that, in the tabular case, is equivalent to a non-bootstrapping optimistic policy iteration algorithm like Sarsa(1) that can be applied both in MDPs and POMDPs. On the theoretical side, by relating the proposed stochastic EM algorithm to the family of optimistic policy iteration algorithms, we provide new tools that permit the design and analysis of algorithms in that family. On the practical side, preliminary experiments on a POMDP problem demonstrated encouraging results.