Bayesian Learning of Noisy Markov Decision Processes

Authors:
Sumeetpal S. Singh;Nicolas Chopin;Nick Whiteley
Affiliations:
University of Cambridge;CREST---ENSAE and HEC Paris;University of Bristol
Venue:
ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special Issue on Monte Carlo Methods in Statistics
Year:
2013

Citing 9
Cited 0

Maximum likelihood estimation of discrete control processes

SIAM Journal on Control and Optimization
Escape, avoidance, and imitation: a neural network approach

Adaptive Behavior
Neuro-Dynamic Programming

Neuro-Dynamic Programming
A hybrid Markov chain for the Bayesian analysis of the multinomial probit model

Statistics and Computing
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Dynamic Programming and Optimal Control, Vol. II

Dynamic Programming and Optimal Control, Vol. II
Apprenticeship learning for helicopter control

Communications of the ACM - Barbara Liskov: ACM's A.M. Turing Award Winner
Fast simulation of truncated Gaussian distributions

Statistics and Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.