Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
Control of a Re-Entrant Line Manufacturing Model with a Reinforcement Learning Approach
ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
Dynamic Programming and Optimal Control, Vol. II
Dynamic Programming and Optimal Control, Vol. II
Winter Simulation Conference
Brief paper: Average cost temporal-difference learning
Automatica (Journal of IFAC)
Hi-index | 0.00 |
Guided by Little's law, decision and control models for operations in reentrant line manufacturing (RLM) systems are commonly set up to minimize the total work-in-process (WIP), whichin turn indirectly minimizes cycle time (CT). By viewing the problem fundamentally differently, we re-formulate it as one that seeks to select the best cost function leading to optimal cycle times. We present the details and results of an extended simulation study, based on a benchmark problem, using a simulation-based approximate dynamic programming method, with a newly proposed extended actor-critic architecture. Our results support the idea that a Markov decision process modeling approach can be used as a flexible platform to explore different cost formulations, leading to a selection of an optimal cost and model to optimize cycle time directly.