Minimizing mean weighted tardiness in unrelated parallel machine scheduling with reinforcement learning

Authors:
Zhicong Zhang;Li Zheng;Na Li;Weiping Wang;Shouyan Zhong;Kaishun Hu
Affiliations:
Department of Industrial Engineering, School of Mechanical Engineering, Dongguan University of Technology, Songshan Lake District, Dongguan 523808, Guangdong Province, China;Department of Industrial Engineering, Tsinghua University, Beijing 100084, China;Department of Industrial Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;School of Mechanical Engineering, Dongguan University of Technology, China;School of Mechanical Engineering, Dongguan University of Technology, China;Department of Industrial Engineering, School of Mechanical Engineering, Dongguan University of Technology, Songshan Lake District, Dongguan 523808, Guangdong Province, China
Venue:
Computers and Operations Research
Year:
2012

Citing 19
Cited 0

Priority rules for job shops with weighted tardiness costs

Management Science
Minmax earliness/tardiness scheduling in identical parallel machine system using genetic algorithms

ICC&IE '94 Proceedings of the 17th international conference on Computers and industrial engineering
Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
Scheduling parallel machines to minimize total weighted and unweighted tardiness

Computers and Operations Research
Model-based average reward reinforcement learning

Artificial Intelligence
Improved heuristics for the n-job single-machine weighted tardiness problem

Computers and Operations Research
Parallel machine scheduling with earliness and tardiness penalties

Computers and Operations Research
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Approximation Schemes for Scheduling Jobs with Common Due Date on Parallel Machines to Minimize Total Tardiness

Journal of Heuristics
Parallel machine earliness and tardiness scheduling with proportional weights

Computers and Operations Research
Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning

Management Science
Multi-Machine Scheduling - A Multi-Agent Learning Approach

ICMAS '98 Proceedings of the 3rd International Conference on Multi Agent Systems
Scheduling unrelated parallel machines to minimize total weighted tardiness

Computers and Operations Research
Distributed Reinforcement Learning Control for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems

Applied Intelligence
A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis

Machine Learning
A Weighted Modified Due Date Rule for Sequencing to Minimize Weighted Tardiness

Journal of Scheduling
Fast learning in networks of locally-tuned processing units

Neural Computation
Application of reinforcement learning for agent-based production scheduling

Engineering Applications of Artificial Intelligence
Heuristic algorithms for unrelated parallel machine scheduling with a common due date, release dates, and linear earliness and tardiness penalties

Mathematical and Computer Modelling: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

We address an unrelated parallel machine scheduling problem with R-learning, an average-reward reinforcement learning (RL) method. Different types of jobs dynamically arrive in independent Poisson processes. Thus the arrival time and the due date of each job are stochastic. We convert the scheduling problems into RL problems by constructing elaborate state features, actions, and the reward function. The state features and actions are defined fully utilizing prior domain knowledge. Minimizing the reward per decision time step is equivalent to minimizing the schedule objective, i.e. mean weighted tardiness. We apply an on-line R-learning algorithm with function approximation to solve the RL problems. Computational experiments demonstrate that R-learning learns an optimal or near-optimal policy in a dynamic environment from experience and outperforms four effective heuristic priority rules (i.e. WSPT, WMDD, ATC and WCOVERT) in all test problems.