Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces

  • Authors:
  • Spyros Reveliotis;Theologos Bountourelis

  • Affiliations:
  • School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, USA;School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, USA

  • Venue:
  • Discrete Event Dynamic Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper considers the problem of computing an optimal policy for a Markov decision process, under lack of complete a priori knowledge of (1) the branching probability distributions determining the evolution of the process state upon the execution of the different actions, and (2) the probability distributions characterizing the immediate rewards returned by the environment as a result of the execution of these actions at different states of the process. In addition, it is assumed that the underlying process evolves in a repetitive, episodic manner, with each episode starting from a well-defined initial state and evolving over an acyclic state space. A novel efficient algorithm for this problem is proposed, and its convergence properties and computational complexity are rigorously characterized in the formal framework of computational learning theory. Furthermore, in the process of deriving the aforementioned results, the presented work generalizes Bechhofer's "indifference-zone" approach for the ranking & selection problem, that arises in statistical inference theory, so that it applies to populations with bounded general distributions.