Inverse Reinforcement Learning in Partially Observable Environments

Authors:
Jaedeug Choi;Kee-Eung Kim
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2011

Citing 22
Cited 1

Acting optimally in partially observable stochastic domains

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Learning agents for uncertain environments (extended abstract)

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
An Empirical Study of Two Approaches to Sequence Learning for Anomaly Detection

Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Finite-memory control of partially observable systems

Finite-memory control of partially observable systems
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Learning structured prediction models: a large margin approach

ICML '05 Proceedings of the 22nd international conference on Machine learning
Maximum margin planning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Partially observable Markov decision processes for spoken dialog systems

Computer Speech and Language
Probabilistic planning for robotic exploration

Probabilistic planning for robotic exploration
Apprenticeship learning using linear programming

Proceedings of the 25th international conference on Machine learning
Point-based policy iteration

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Maximum entropy inverse reinforcement learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Anytime point-based approximations for large POMDPs

Journal of Artificial Intelligence Research
Bayesian inverse reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Inverse reinforcement learning in partially observable environments

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Training parsers by inverse reinforcement learning

Machine Learning
Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework

IEEE Journal on Selected Areas in Communications

Bayesian multitask inverse reinforcement learning

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Inverse reinforcement learning (IRL) is the problem of recovering the underlying reward function from the behavior of an expert. Most of the existing IRL algorithms assume that the environment is modeled as a Markov decision process (MDP), although it is desirable to handle partially observable settings in order to handle more realistic scenarios. In this paper, we present IRL algorithms for partially observable environments that can be modeled as a partially observable Markov decision process (POMDP). We deal with two cases according to the representation of the given expert's behavior, namely the case in which the expert's policy is explicitly given, and the case in which the expert's trajectories are available instead. The IRL in POMDPs poses a greater challenge than in MDPs since it is not only ill-posed due to the nature of IRL, but also computationally intractable due to the hardness in solving POMDPs. To overcome these obstacles, we present algorithms that exploit some of the classical results from the POMDP literature. Experimental results on several benchmark POMDP domains show that our work is useful for partially observable settings.