Learning and discovery of predictive state representations in dynamical systems with reset
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning low dimensional predictive representations
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Predictive state representations: a new theory for modeling dynamical systems
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Learning predictive representations from a history
ICML '05 Proceedings of the 22nd international conference on Machine learning
Learning predictive state representations in dynamical systems without reset
ICML '05 Proceedings of the 22nd international conference on Machine learning
On-line discovery of temporal-difference networks
Proceedings of the 25th international conference on Machine learning
Episodic Reinforcement Learning by Logistic Reward-Weighted Regression
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Maintaining predictions over time without a model
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Closing the learning-planning loop with predictive state representations
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Closing the learning-planning loop with predictive state representations
International Journal of Robotics Research
Learning to make predictions in partially observable environments without a generative model
Journal of Artificial Intelligence Research
Goal-Directed online learning of predictive models
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Hi-index | 0.00 |
Predictive state representations (PSRs) are powerful models of non-Markovian decision processes that differ from traditional models (e.g., HMMs, POMDPs) by representing state using only observable quantities. Because of this, PSRs can be learned solely using data from interaction with the process. The majority of existing techniques, though, explicitly or implicitly require that this data be gathered using a blind policy, where actions are selected independently of preceding observations. This is a severe limitation for practical learning of PSRs. We present two methods for fixing this limitation in most of the existing PSR algorithms: one when the policy is known and one when it is not. We then present an efficient optimization for computing good exploration policies to be used when learning a PSR. The exploration policies, which are not blind, significantly lower the amount of data needed to build an accurate model, thus demonstrating the importance of non-blind policies.