Backfilling with lookahead to optimize the packing of parallel jobs

Authors:
Edi Shmueli;Dror G. Feitelson
Affiliations:
Department of Computer Science, Haifa University, Haifa, Israel and IBM Haifa Research Laboratory, Israel;School of Computer Science and Engineering, Hebrew University, Jerusalem, Israel
Venue:
Journal of Parallel and Distributed Computing
Year:
2005

Citing 17
Cited 16

Predictability of Process Resource Usage: A Measurement-Based Study on UNIX

IEEE Transactions on Software Engineering
`` Strong '' NP-Completeness Results: Motivation, Examples, and Implications

Journal of the ACM (JACM)
Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling

IEEE Transactions on Parallel and Distributed Systems
A comparative study of online scheduling algorithms for networks of workstations

Cluster Computing
Modeling the Communication Performance of the IBM SP2

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Supporting Priorities and Improving Utilization of the IBM SP Scheduler Using Slack-Based Backfilling

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
The ANL/IBM SP Scheduling System

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
The EASY - LoadLeveler API Project

IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization

IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
Core Algorithms of the Maui Scheduler

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Multiple-Queue Backfilling Scheduling with Priorities and Reservations for Parallel Systems

JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
Selective Reservation Strategies for Backfill Job Scheduling

JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
Scheduling Jobs on Parallel Systems Using a Relaxed Backfill Strategy

JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
The Impact of More Accurate Requested Runtimes on Production Job Scheduling Performance

JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
On-line Scheduling

Developments from a June 1996 seminar on Online algorithms: the state of the art
Experimental Analysis of the Root Causes of Performance Evaluation Results: A Backfilling Case Study

IEEE Transactions on Parallel and Distributed Systems
Scheduling algorithms

Algorithms and theory of computation handbook

Backfilling Using System-Generated Predictions Rather than User Runtime Estimates

IEEE Transactions on Parallel and Distributed Systems
Detection workload in a dynamic grid-based intrusion detection environment

Journal of Parallel and Distributed Computing
A job scheduling framework for large computing farms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Rescheduling co-allocation requests based on flexible advance reservations and processor remapping

GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Time and cost trade-off management for scheduling parallel applications on Utility Grids

Future Generation Computer Systems
PV-EASY: a strict fairness guaranteed and prediction enabled scheduler in parallel job scheduling

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Optimal job packing, a backfill scheduling optimization for a cluster of workstations

The Journal of Supercomputing
MetaLoRaS: a predictable metascheduler for non-dedicated multiclusters

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Coordinated rescheduling of Bag-of-Tasks for executions on multiple resource providers

Concurrency and Computation: Practice & Experience
Online algorithms for single machine schedulers to support advance reservations from grid jobs

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
A multi-criteria job scheduling framework for large computing farms

Journal of Computer and System Sciences
Double auction-inspired meta-scheduling of parallel applications on global grids

Journal of Parallel and Distributed Computing
MIP model scheduling for multi-clusters

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Exploring portfolio scheduling for long-term execution of scientific workloads in IaaS clouds

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Extending goal-oriented parallel computer job scheduling policies to heterogeneous systems

The Journal of Supercomputing
Toward balanced and sustainable job scheduling for production supercomputers

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The utilization of parallel computers depends on how jobs are packed together: if the jobs are not packed tightly, resources are lost due to fragmentation. The problem is that the goal of high utilization may conflict with goals of fairness or even progress for all jobs. The common solution is to use backfilling, which combines a reservation for the first job in the interest of progress with packing of later jobs to fill in holes and increase utilization. However, backfilling considers the queued jobs one at a time, and thus might miss better packing opportunities. We propose the use of dynamic programming to find the best packing possible given the current composition of the queue, thus maximizing the utilization on every scheduling step. Simulations of this algorithm, called lookahead optimizing scheduler (LOS), using trace files from several IBM SP parallel systems, show that LOS indeed improves utilization, and thereby reduces the mean response time and mean slowdown of all jobs. Moreover, it is actually possible to limit the lookahead depth to about 50 jobs and still achieve essentially the same results. Finally, we experimented with selecting among alternative sets of jobs that achieve the same utilization. Surprising results indicate that choosing the set at the head of the queue does not necessarily guarantee best performance. Instead, repeatedly selecting the set with the maximal overall expected slowdown boosts performance when compared to all other alternatives checked.