Planning and Learning in Environments with Delayed Feedback

  • Authors:
  • Thomas J. Walsh;Ali Nouri;Lihong Li;Michael L. Littman

  • Affiliations:
  • Rutgers, The State University of New Jersey, Department of Computing Science, 110 Frelinghuysen Rd., Piscataway, NJ 08854,;Rutgers, The State University of New Jersey, Department of Computing Science, 110 Frelinghuysen Rd., Piscataway, NJ 08854,;Rutgers, The State University of New Jersey, Department of Computing Science, 110 Frelinghuysen Rd., Piscataway, NJ 08854,;Rutgers, The State University of New Jersey, Department of Computing Science, 110 Frelinghuysen Rd., Piscataway, NJ 08854,

  • Venue:
  • ECML '07 Proceedings of the 18th European conference on Machine Learning
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work considers the problems of planning and learning in environments with constant observation and reward delays. We provide a hardness result for the general planning problem and positive results for several special cases with deterministic or otherwise constrained dynamics. We present an algorithm, Model Based Simulation, for planning in such environments and use model-based reinforcement learning to extend this approach to the learning setting in both finite and continuous environments. Empirical comparisons show this algorithm holds significant advantages over others for decision making in delayed environments.