Learning and planning in environments with delayed feedback

  • Authors:
  • Thomas J. Walsh;Ali Nouri;Lihong Li;Michael L. Littman

  • Affiliations:
  • Department of Computer Science, Rutgers University, Piscataway, USA 08854;Department of Computer Science, Rutgers University, Piscataway, USA 08854;Department of Computer Science, Rutgers University, Piscataway, USA 08854;Department of Computer Science, Rutgers University, Piscataway, USA 08854

  • Venue:
  • Autonomous Agents and Multi-Agent Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.02

Visualization

Abstract

This work considers the problems of learning and planning in Markovian environments with constant observation and reward delays. We provide a hardness result for the general planning problem and positive results for several special cases with deterministic or otherwise constrained dynamics. We present an algorithm, Model Based Simulation, for planning in such environments and use model-based reinforcement learning to extend this approach to the learning setting in both finite and continuous environments. Empirical comparisons show this algorithm holds significant advantages over others for decision making in delayed-observation environments.