Closing the learning-planning loop with predictive state representations

Authors:
Byron Boots;Sajid M Siddiqi;Geoffrey J Gordon
Affiliations:
Machine Learning Department, Carnegie Mellon University,Pittsburgh, PA, USA;Google, Inc., Pittsburgh, PA, USA;Machine Learning Department, Carnegie Mellon University,Pittsburgh, PA, USA
Venue:
International Journal of Robotics Research
Year:
2011

Citing 21
Cited 2

Cryptographic limitations on learning Boolean formulae and finite automata

Journal of the ACM (JACM)
Acting optimally in partially observable stochastic domains

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Causality: models, reasoning, and inference

Causality: models, reasoning, and inference
On the Learnability of Hidden Markov Models

ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Reinforcement learning with selective perception and hidden state

Reinforcement learning with selective perception and hidden state
Learning low dimensional predictive representations

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Predictive state representations: a new theory for modeling dynamical systems

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Learning predictive representations from a history

ICML '05 Proceedings of the 22nd international conference on Machine learning
Learning predictive state representations in dynamical systems without reset

ICML '05 Proceedings of the 22nd international conference on Machine learning
Observable Operator Models for Discrete Stochastic Time Series

Neural Computation
Learning predictive state representations using non-blind policies

ICML '06 Proceedings of the 23rd international conference on Machine learning
On discovery and learning of models with predictive representations of state for agents with continuous actions and observations

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Efficiently learning linear-linear exponential family predictive representations of state

Proceedings of the 25th international conference on Machine learning
Exponential family predictive representations of state

Exponential family predictive representations of state
A bound on modeling error in observable operator models and an associated learning algorithm

Neural Computation
Improving approximate value iteration using memories and predictive state representations

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Anytime point-based approximations for large POMDPs

Journal of Artificial Intelligence Research
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Point-based planning for predictive state representations

Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Model-based online learning of POMDPs

ECML'05 Proceedings of the 16th European conference on Machine Learning

Extending sensorimotor contingency theory: prediction, planning, and action generation

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Multi-timescale nexting in a reinforcement learning robot

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate environment model, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model that is too inaccurate to support planning or too large and complex for planning to succeed; or they require excessive prior domain knowledge or fail to provide guarantees such as statistical consistency. To address this gap, we propose a novel algorithm which provably learns a compact, accurate model directly from sequences of action-observation pairs. We then evaluate the learner by closing the loop from observations to actions. In more detail, we present a spectral algorithm for learning a predictive state representation (PSR), and evaluate it in a simulated, vision-based mobile robot planning task, showing that the learned PSR captures the essential features of the environment and enables successful and efficient planning. Our algorithm has several benefits which have not appeared together in any previous PSR learner: it is computationally efficient and statistically consistent; it handles high-dimensional observations and long time horizons; and, our close-the-loop experiments provide an end-to-end practical test.