Planning and Learning in Environments with Delayed Feedback

Authors:
Thomas J. Walsh;Ali Nouri;Lihong Li;Michael L. Littman
Affiliations:
Rutgers, The State University of New Jersey, Department of Computing Science, 110 Frelinghuysen Rd., Piscataway, NJ 08854,;Rutgers, The State University of New Jersey, Department of Computing Science, 110 Frelinghuysen Rd., Piscataway, NJ 08854,;Rutgers, The State University of New Jersey, Department of Computing Science, 110 Frelinghuysen Rd., Piscataway, NJ 08854,;Rutgers, The State University of New Jersey, Department of Computing Science, 110 Frelinghuysen Rd., Piscataway, NJ 08854,
Venue:
ECML '07 Proceedings of the 18th European conference on Machine Learning
Year:
2007

Citing 13
Cited 0

The complexity of Markov decision processes

Mathematics of Operations Research
Closed-loop control with delayed information

SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Reinforcement learning for robots using neural networks

Reinforcement learning for robots using neural networks
An Upper Bound on the Loss from Approximate Optimal-Value Functions

Machine Learning
Locally Weighted Learning for Control

Artificial Intelligence Review - Special issue on lazy learning
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Rates of Convergence for Variable Resolution Schemes in Optimal Control

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Locally Weighted Projection Regression: Incremental Real Time Learning in High Dimensional Space

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Algorithms for sequential decision-making

Algorithms for sequential decision-making
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work considers the problems of planning and learning in environments with constant observation and reward delays. We provide a hardness result for the general planning problem and positive results for several special cases with deterministic or otherwise constrained dynamics. We present an algorithm, Model Based Simulation, for planning in such environments and use model-based reinforcement learning to extend this approach to the learning setting in both finite and continuous environments. Empirical comparisons show this algorithm holds significant advantages over others for decision making in delayed environments.