An improved policy iteration algorithm for partially observable MDPs
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
Discrete-time, Discrete-valued Observable Operator Models: a Tutorial
Discrete-time, Discrete-valued Observable Operator Models: a Tutorial
Learning and discovery of predictive state representations in dynamical systems with reset
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning low dimensional predictive representations
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Predictive state representations: a new theory for modeling dynamical systems
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Observable Operator Models for Discrete Stochastic Time Series
Neural Computation
Learning predictive state representations using non-blind policies
ICML '06 Proceedings of the 23rd international conference on Machine learning
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Predictive representations for policy gradient in POMDPs
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Relational knowledge with predictive state representations
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Closing the learning-planning loop with predictive state representations
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Closing the learning-planning loop with predictive state representations
International Journal of Robotics Research
Hi-index | 0.00 |
Predictive State Representations (PSRs) have shown a great deal of promise as an alternative to Markov models. However, learning a PSR from a single stream of data generated from an environment remains a challenge. In this work, we present a formalism of PSRs and the domains they model. This formalization suggests an algorithm for learning PSRs that will (almost surely) converge to a globally optimal model given sufficient training data.