Learning and planning in environments with delayed feedback

Authors:
Thomas J. Walsh;Ali Nouri;Lihong Li;Michael L. Littman
Affiliations:
Department of Computer Science, Rutgers University, Piscataway, USA 08854;Department of Computer Science, Rutgers University, Piscataway, USA 08854;Department of Computer Science, Rutgers University, Piscataway, USA 08854;Department of Computer Science, Rutgers University, Piscataway, USA 08854
Venue:
Autonomous Agents and Multi-Agent Systems
Year:
2009

Citing 19
Cited 2

The complexity of Markov decision processes

Mathematics of Operations Research
Closed-loop control with delayed information

SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Reinforcement learning for robots using neural networks

Reinforcement learning for robots using neural networks
An Upper Bound on the Loss from Approximate Optimal-Value Functions

Machine Learning
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Locally Weighted Learning for Control

Artificial Intelligence Review - Special issue on lazy learning
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Kernel-Based Reinforcement Learning

Machine Learning
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Rates of Convergence for Variable Resolution Schemes in Optimal Control

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Locally Weighted Projection Regression: Incremental Real Time Learning in High Dimensional Space

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Algorithms for sequential decision-making

Algorithms for sequential decision-making
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
A reinforcement learning algorithm with polynomial interaction complexity for only-costly-observable MDPs

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
A POMDP approximation algorithm that anticipates the need to observe

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence

TEXPLORE: real-time sample-efficient reinforcement learning for robots

Machine Learning
Applying hybrid learning approach to RoboCup's strategy

Journal of Systems and Software

Quantified Score

Hi-index	0.02

Visualization

Abstract

This work considers the problems of learning and planning in Markovian environments with constant observation and reward delays. We provide a hardness result for the general planning problem and positive results for several special cases with deterministic or otherwise constrained dynamics. We present an algorithm, Model Based Simulation, for planning in such environments and use model-based reinforcement learning to extend this approach to the learning setting in both finite and continuous environments. Empirical comparisons show this algorithm holds significant advantages over others for decision making in delayed-observation environments.