Maintaining predictions over time without a model

Authors:
Erik Talvitie;Satinder Singh
Affiliations:
Computer Science and Engineering, University of Michigan;Computer Science and Engineering, University of Michigan
Venue:
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Year:
2009

Citing 5
Cited 1

The Optimal Reward Baseline for Gradient-Based Reinforcement Learning

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Finite-memory control of partially observable systems

Finite-memory control of partially observable systems
Learning predictive state representations using non-blind policies

ICML '06 Proceedings of the 23rd international conference on Machine learning
Looping suffix tree-based inference of partially observable hidden state

ICML '06 Proceedings of the 23rd international conference on Machine learning
On predictive linear gaussian models

On predictive linear gaussian models

Learning to make predictions in partially observable environments without a generative model

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common approach to the control problem in partially observable environments is to perform a direct search in policy space, as defined over some set of features of history. In this paper we consider predictive features, whose values are conditional probabilities of future events, given history. Since predictive features provide direct information about the agent's future, they have a number of advantages for control. However, unlike more typical features defined directly over past observations, it is not clear how to maintain the values of predictive features over time. A model could be used, since a model can make any prediction about the future, but in many cases learning a model is infeasible. In this paper we demonstrate that in some cases it is possible to learn to maintain the values of a set of predictive features even when a learning a model is infeasible, and that natural predictive features can be useful for policy-search methods.