Bayesian nonparametric inverse reinforcement learning

Authors:
Bernard Michini;Jonathan P. How
Affiliations:
Massachusetts Institute of Technology, Cambridge, Massachusetts;Massachusetts Institute of Technology, Cambridge, Massachusetts
Venue:
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Year:
2012

Citing 12
Cited 0

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Maximum margin planning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Graphical models for visual object recognition and tracking

Graphical models for visual object recognition and tracking
A survey of robot learning from demonstration

Robotics and Autonomous Systems
Active Learning for Reward Estimation in Inverse Reinforcement Learning

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Maximum entropy inverse reinforcement learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Bayesian inverse reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Inverse reinforcement learning (IRL) is the task of learning the reward function of a Markov Decision Process (MDP) given the transition function and a set of observed demonstrations in the form of state-action pairs. Current IRL algorithms attempt to find a single reward function which explains the entire observation set. In practice, this leads to a computationally-costly search over a large (typically infinite) space of complex reward functions. This paper proposes the notion that if the observations can be partitioned into smaller groups, a class of much simpler reward functions can be used to explain each group. The proposed method uses a Bayesian nonparametric mixture model to automatically partition the data and find a set of simple reward functions corresponding to each partition. The simple rewards are interpreted intuitively as subgoals, which can be used to predict actions or analyze which states are important to the demonstrator. Experimental results are given for simple examples showing comparable performance to other IRL algorithms in nominal situations. Moreover, the proposed method handles cyclic tasks (where the agent begins and ends in the same state) that would break existing algorithms without modification. Finally, the new algorithm has a fundamentally different structure than previous methods, making it more computationally efficient in a real-world learning scenario where the state space is large but the demonstration set is small.