The complexity of Markov decision processes
Mathematics of Operations Research
Closed-loop control with delayed information
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Reinforcement learning for robots using neural networks
Reinforcement learning for robots using neural networks
An Upper Bound on the Loss from Approximate Optimal-Value Functions
Machine Learning
Locally Weighted Learning for Control
Artificial Intelligence Review - Special issue on lazy learning
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Rates of Convergence for Variable Resolution Schemes in Optimal Control
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Locally Weighted Projection Regression: Incremental Real Time Learning in High Dimensional Space
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Algorithms for sequential decision-making
Algorithms for sequential decision-making
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Hi-index | 0.00 |
This work considers the problems of planning and learning in environments with constant observation and reward delays. We provide a hardness result for the general planning problem and positive results for several special cases with deterministic or otherwise constrained dynamics. We present an algorithm, Model Based Simulation, for planning in such environments and use model-based reinforcement learning to extend this approach to the learning setting in both finite and continuous environments. Empirical comparisons show this algorithm holds significant advantages over others for decision making in delayed environments.