Learning predictive state representations using non-blind policies

Authors:
Michael Bowling;Peter McCracken;Michael James;James Neufeld;Dana Wilkinson
Affiliations:
University of Alberta, Edmonton, Alberta, Canada;University of Alberta, Edmonton, Alberta, Canada;Toyota Technical Center, Ann Arbor, Michigan;University of Alberta, Edmonton, Alberta, Canada;University of Waterloo, Waterloo, Ontario, Canada
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 5
Cited 10

Learning and discovery of predictive state representations in dynamical systems with reset

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning low dimensional predictive representations

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Predictive state representations: a new theory for modeling dynamical systems

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Learning predictive representations from a history

ICML '05 Proceedings of the 22nd international conference on Machine learning
Learning predictive state representations in dynamical systems without reset

ICML '05 Proceedings of the 22nd international conference on Machine learning

On-line discovery of temporal-difference networks

Proceedings of the 25th international conference on Machine learning
Episodic Reinforcement Learning by Logistic Reward-Weighted Regression

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
A bound on modeling error in observable operator models and an associated learning algorithm

Neural Computation
Discovery and learning of models with predictive state representations for dynamical systems without reset

Knowledge-Based Systems
Maintaining predictions over time without a model

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Making the error-controlling algorithm of observable operator models constructive

Neural Computation
Closing the learning-planning loop with predictive state representations

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Closing the learning-planning loop with predictive state representations

International Journal of Robotics Research
Learning to make predictions in partially observable environments without a generative model

Journal of Artificial Intelligence Research
Goal-Directed online learning of predictive models

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Predictive state representations (PSRs) are powerful models of non-Markovian decision processes that differ from traditional models (e.g., HMMs, POMDPs) by representing state using only observable quantities. Because of this, PSRs can be learned solely using data from interaction with the process. The majority of existing techniques, though, explicitly or implicitly require that this data be gathered using a blind policy, where actions are selected independently of preceding observations. This is a severe limitation for practical learning of PSRs. We present two methods for fixing this limitation in most of the existing PSR algorithms: one when the policy is known and one when it is not. We then present an efficient optimization for computing good exploration policies to be used when learning a PSR. The exploration policies, which are not blind, significantly lower the amount of data needed to build an accurate model, thus demonstrating the importance of non-blind policies.