Bayesian Learning of Noisy Markov Decision Processes

  • Authors:
  • Sumeetpal S. Singh;Nicolas Chopin;Nick Whiteley

  • Affiliations:
  • University of Cambridge;CREST---ENSAE and HEC Paris;University of Bristol

  • Venue:
  • ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special Issue on Monte Carlo Methods in Statistics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.