Random early detection gateways for congestion avoidance
IEEE/ACM Transactions on Networking (TON)
The Markov-modulated Poisson process (MMPP) cookbook
Performance Evaluation
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming
Rollout Algorithms for Stochastic Scheduling Problems
Journal of Heuristics
Scheduling Multiclass Packet Streams to Minimize Weighted Loss
Queueing Systems: Theory and Applications
INFOCOM '97 Proceedings of the INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Driving the Information Revolution
On-line sampling-based control for network queueing problems
On-line sampling-based control for network queueing problems
Dynamics of TCP traffic over ATM networks
IEEE Journal on Selected Areas in Communications
Partially Observable Markov Decision Process Approximations for Adaptive Sensing
Discrete Event Dynamic Systems
Online planning algorithms for POMDPs
Journal of Artificial Intelligence Research
Parallelizing parallel rollout algorithm for solving Markov decision processes
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
A policy improvement method for constrained average Markov decision processes
Operations Research Letters
Hi-index | 0.01 |
We propose a novel approach, called parallel rollout, to solving (partially observable) Markov decision processes. Our approach generalizes the rollout algorithm of Bertsekas and Castanon (1999) by rolling out a set of multiple heuristic policies rather than a single policy. In particular, the parallel rollout approach aims at the class of problems where we have multiple heuristic policies available such that each policy performs near-optimal for a different set of system paths. Parallel rollout automatically combines the given multiple policies to create a new policy that adapts to the different system paths and improves the performance of each policy in the set. We formally prove this claim for two criteria: total expected reward and infinite horizon discounted reward. The parallel rollout approach also resolves the key issue of selecting which policy to roll out among multiple heuristic policies whose performances cannot be predicted in advance. We present two example problems to illustrate the effectiveness of the parallel rollout approach: a buffer management problem and a multiclass scheduling problem.