Polynomial-time reinforcement learning of near-optimal policies

  • Authors:
  • Karèn Pivazyan;Yoav Shoham

  • Affiliations:
  • Management, Science and Engineering Department, Stanford University, Stanford, CA;Computer Science Department, Stanford University, Stanford, CA

  • Venue:
  • Eighteenth national conference on Artificial intelligence
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Inspired by recent results on polynomial time reinforcement algorithms that accumulate near-optimal rewards, we look at the related problem of quickly learning near-optimal policies. The new problem is obviously related to the previous one, but different in important ways. We provide simple algorithms for MDPs, zero-sum and common-payoff Stochastic Games, and a uniform framework for proving their polynomial complexity. Unlike the previously studied problem, these bounds use the minimum between the mixing time and a new quantity - the spectral radius. Unlike the previous results, our results apply uniformly to the average and discounted cases.