Probabilistic inference for solving discrete and continuous state Markov Decision Processes

Authors:
Marc Toussaint;Amos Storkey
Affiliations:
University of Edinburgh, Edinburgh, UK;University of Edinburgh, Edinburgh, UK
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 10
Cited 14

Factorial Hidden Markov Models

Machine Learning - Special issue on learning with probabilistic representations
Reinforcement Learning

Reinforcement Learning
Computing Factored Value Functions for Policies in Structured MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
A family of algorithms for approximate bayesian inference

A family of algorithms for approximate bayesian inference
Dynamic bayesian networks: representation, inference and learning

Dynamic bayesian networks: representation, inference and learning
Efficient solution algorithms for factored MDPs

Journal of Artificial Intelligence Research
Policy recognition in the abstract hidden Markov model

Journal of Artificial Intelligence Research
An MCMC approach to solving hybrid factored MDPs

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Exploiting structure in policy construction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Hierarchical solution of Markov decision processes using macro-actions

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Approximate inference for planning in stochastic relational worlds

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Model-free reinforcement learning as mixture learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Inference and Learning for Active Sensing, Experimental Design and Control

IbPRIA '09 Proceedings of the 4th Iberian Conference on Pattern Recognition and Image Analysis
Variational Bayesian learning of nonlinear hidden state-space models for model predictive control

Neurocomputing
Relevance Grounding for Planning in Relational Domains

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
New inference strategies for solving Markov decision processes using reversible jump MCMC

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Planning with noisy probabilistic relational rules

Journal of Artificial Intelligence Research
Distributed control of uncertain systems using superpositions of linear operators

Neural Computation
Analyzing and escaping local optima in planning as inference for partially observable domains

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Influence diagrams with memory states: representation and algorithms

ADT'11 Proceedings of the Second international conference on Algorithmic decision theory
Scalable multiagent planning using probabilistic inference

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Active learning of inverse models with intrinsically motivated goal exploration in robots

Robotics and Autonomous Systems
From dynamic movement primitives to associative skill memories

Robotics and Autonomous Systems
Monte-Carlo expectation maximization for decentralized POMDPs

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Inference in Markov Decision Processes has recently received interest as a means to infer goals of an observed action, policy recognition, and also as a tool to compute policies. A particularly interesting aspect of the approach is that any existing inference technique in DBNs now becomes available for answering behavioral question--including those on continuous, factorial, or hierarchical state representations. Here we present an Expectation Maximization algorithm for computing optimal policies. Unlike previous approaches we can show that this actually optimizes the discounted expected future return for arbitrary reward functions and without assuming an ad hoc finite total time. The algorithm is generic in that any inference technique can be utilized in the E-step. We demonstrate this for exact inference on a discrete maze and Gaussian belief state propagation in continuous stochastic optimal control problems.