Qualitative reinforcement learning

  • Authors:
  • Arkady Epshteyn;Gerald DeJong

  • Affiliations:
  • University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL

  • Venue:
  • ICML '06 Proceedings of the 23rd international conference on Machine learning
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

When the transition probabilities and rewards of a Markov Decision Process are specified exactly, the problem can be solved without any interaction with the environment. When no such specification is available, the agent's only recourse is a long and potentially dangerous exploration. We present a framework which allows the expert to specify imprecise knowledge of transition probabilities in terms of stochastic dominance constraints. Our algorithm can be used to find optimal policies for qualitatively specified problems, or, when no such solution is available, to decrease the required amount of exploration. The algorithm's behavior is demonstrated on simulations of two classic problems: mountain car ascent and cart pole balancing.