Priority rules for job shops with weighted tardiness costs
Management Science
Minmax earliness/tardiness scheduling in identical parallel machine system using genetic algorithms
ICC&IE '94 Proceedings of the 17th international conference on Computers and industrial engineering
Average reward reinforcement learning: foundations, algorithms, and empirical results
Machine Learning - Special issue on reinforcement learning
Scheduling parallel machines to minimize total weighted and unweighted tardiness
Computers and Operations Research
Model-based average reward reinforcement learning
Artificial Intelligence
Improved heuristics for the n-job single-machine weighted tardiness problem
Computers and Operations Research
Parallel machine scheduling with earliness and tardiness penalties
Computers and Operations Research
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Parallel machine earliness and tardiness scheduling with proportional weights
Computers and Operations Research
Multi-Machine Scheduling - A Multi-Agent Learning Approach
ICMAS '98 Proceedings of the 3rd International Conference on Multi Agent Systems
Scheduling unrelated parallel machines to minimize total weighted tardiness
Computers and Operations Research
A Weighted Modified Due Date Rule for Sequencing to Minimize Weighted Tardiness
Journal of Scheduling
Fast learning in networks of locally-tuned processing units
Neural Computation
Application of reinforcement learning for agent-based production scheduling
Engineering Applications of Artificial Intelligence
Mathematical and Computer Modelling: An International Journal
Hi-index | 0.01 |
We address an unrelated parallel machine scheduling problem with R-learning, an average-reward reinforcement learning (RL) method. Different types of jobs dynamically arrive in independent Poisson processes. Thus the arrival time and the due date of each job are stochastic. We convert the scheduling problems into RL problems by constructing elaborate state features, actions, and the reward function. The state features and actions are defined fully utilizing prior domain knowledge. Minimizing the reward per decision time step is equivalent to minimizing the schedule objective, i.e. mean weighted tardiness. We apply an on-line R-learning algorithm with function approximation to solve the RL problems. Computational experiments demonstrate that R-learning learns an optimal or near-optimal policy in a dynamic environment from experience and outperforms four effective heuristic priority rules (i.e. WSPT, WMDD, ATC and WCOVERT) in all test problems.