ε-mdps: learning in varying environments

  • Authors:
  • István Szita;Bálint Takács;András Lörincz

  • Affiliations:
  • Department of Information Systems, Eötvös Loránd University, Pázmány Péter sétány 1/C, Budapest, Hungary H-1117;Department of Information Systems, Eötvös Loránd University, Pázmány Péter sétány 1/C, Budapest, Hungary H-1117;Department of Information Systems, Eötvös Loránd University, Pázmány Péter sétány 1/C, Budapest, Hungary H-1117

  • Venue:
  • The Journal of Machine Learning Research
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper ε-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvari and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments. The potential of this new family of MDP models is illustrated via a reinforcement learning algorithm called event-learning which separates the optimization of decision making from the controller. We show that event-learning augmented by a particular controller, which gives rise to an ε-MDP, enables near optimal performance even if considerable and sudden changes may occur in the environment. Illustrations are provided on the two-segment pendulum problem.