Learning predictive state representations using non-blind policies

  • Authors:
  • Michael Bowling;Peter McCracken;Michael James;James Neufeld;Dana Wilkinson

  • Affiliations:
  • University of Alberta, Edmonton, Alberta, Canada;University of Alberta, Edmonton, Alberta, Canada;Toyota Technical Center, Ann Arbor, Michigan;University of Alberta, Edmonton, Alberta, Canada;University of Waterloo, Waterloo, Ontario, Canada

  • Venue:
  • ICML '06 Proceedings of the 23rd international conference on Machine learning
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Predictive state representations (PSRs) are powerful models of non-Markovian decision processes that differ from traditional models (e.g., HMMs, POMDPs) by representing state using only observable quantities. Because of this, PSRs can be learned solely using data from interaction with the process. The majority of existing techniques, though, explicitly or implicitly require that this data be gathered using a blind policy, where actions are selected independently of preceding observations. This is a severe limitation for practical learning of PSRs. We present two methods for fixing this limitation in most of the existing PSR algorithms: one when the policy is known and one when it is not. We then present an efficient optimization for computing good exploration policies to be used when learning a PSR. The exploration policies, which are not blind, significantly lower the amount of data needed to build an accurate model, thus demonstrating the importance of non-blind policies.