Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
The MAXQ Method for Hierarchical Reinforcement Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Recent Advances in Hierarchical Reinforcement Learning
Discrete Event Dynamic Systems
Learning and discovery of predictive state representations in dynamical systems with reset
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Predictive state representations: a new theory for modeling dynamical systems
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Learning predictive state representations in dynamical systems without reset
ICML '05 Proceedings of the 22nd international conference on Machine learning
Combining memory and landmarks with predictive state representations
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Hierarchical solution of Markov decision processes using macro-actions
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Coordinating with the Future: The Anticipatory Nature of Representation
Minds and Machines
Learning to make predictions in partially observable environments without a generative model
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Recent work on predictive state representation (PSR) models has focused on using predictions of the outcomes of open-loop action sequences as state. These predictions answer questions of the form "What is the probability of seeing observation sequence o1, o2, ..., oN if the agent takes action sequence a1, a2, ..., aN from some given history?" We would like to ask more expressive questions in our representation of state, such as "If I behave according to some policy until I terminate, what will be my last observation?" We extend the linear PSR framework to answer questions like these about options -- temporally extended, closed-loop courses of action -- bounding the size of the linear PSR needed to model questions about a certain class of options. We introduce a hierarchical PSR (HPSR) that can make predictions about both options and primitive action sequences and show empirical results from learning HPSRs in simple domains.