Probably approximately correct (pac) exploration in reiforcement learning

  • Authors:
  • Michael Littman;Alexander L. Strehl

  • Affiliations:
  • Rutgers The State University of New Jersey - New Brunswick;Rutgers The State University of New Jersey - New Brunswick

  • Venue:
  • Probably approximately correct (pac) exploration in reiforcement learning
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reinforcement Learning (RL) in finite state and action Markov Decision Processes is studied with an emphasis on the well-studied exploration problem. We provide a general RL framework that applies to all results in this thesis and to other results in RL that generalize the finite MDP assumption. We present two new versions of the Model-Based Interval Estimation (MBIE) algorithm and prove that they are both PAC-MDP. These algorithms are provably more efficient any than previously studied RL algorithms. We prove that many model-based algorithms (including R-MAX and MBIE) can be modified so that their worst-case per-step computational complexity is vastly improved without sacrificing their attractive theoretical guarantees. We show that it is possible to obtain PAC-MDP bounds with a model-free algorithm called Delayed Q-learning.