Job Management Requirements for NAS Parallel Systems and Clusters
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Job Scheduling Under the Portable Batch System
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
A Comparison of Workload Traces from Two Production Parallel Machines
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Utilization and Predictability in Scheduling the IBM SP2 with Backfilling
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Production Job Scheduling for Parallel Shared Memory Systems
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Model for Moldable Supercomputer Jobs
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
The Influence of the Structure and Sizes of Jobs on the Performance of Co-allocation
IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
The Influence of Communication on the Performance of Co-allocation
JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Selective Reservation Strategies for Backfill Job Scheduling
JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
Benefit of Limited Time Sharing in the Presence of Very Large Parallel Jobs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
On the Scalability of Centralized Control
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18 - Volume 19
Backfilling with lookahead to optimize the packing of parallel jobs
Journal of Parallel and Distributed Computing
PV-EASY: a strict fairness guaranteed and prediction enabled scheduler in parallel job scheduling
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Optimal job packing, a backfill scheduling optimization for a cluster of workstations
The Journal of Supercomputing
Multisite co-allocation algorithms for computational grid
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Using inaccurate estimates accurately
JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Capacity estimation in HPC systems: simulation approach
ICDCIT'11 Proceedings of the 7th international conference on Distributed computing and internet technology
Are user runtime estimates inherently inaccurate?
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Pitfalls in parallel job scheduling evaluation
JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
An analysis of computational workloads for the ORNL Jaguar system
Proceedings of the 26th ACM international conference on Supercomputing
Hierarchical scheduling strategies for parallel tasks and advance reservations in grids
Journal of Scheduling
Hi-index | 0.00 |
The NAS facility has operated parallel supercomputers for the past 11 years, including the Intel iPSC/860, Intel Paragon, Thinking Machines CM-5, IBM SP-2, and Cray Origin 2000. Across this wide variety of machine architectures, across a span of 10 years, across a large number of different users, and through thousands of minor configuration and policy changes, the utilization of these machines shows three general trends: (1) scheduling using a naive FCFS first-fit policy results in 40-60% utilization, (2) switching to the more sophisticated dynamic backfilling scheduling algorithm improves utilization by about 15 percentage points (yielding about 70% utilization), and (3) reducing the maximum allowable job size further increases utilization. Most surprising is the consistency of these trends. Over the lifetime of the NAS parallel systems, we made hundreds, perhaps thousands, of small changes to hardware, software, and policy, yet utilization was affected little. In particular, these results show that the goal of achieving near 100% utilization while supporting a real parallel supercomputing workload is unrealistic.