Modeling user runtime estimates

Authors:
Dan Tsafrir;Yoav Etsion;Dror G. Feitelson
Affiliations:
School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel;School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel;School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel
Venue:
JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Year:
2005

Citing 22
Cited 21

Randomization, speculation, and adaptation in batch schedulers

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling

IEEE Transactions on Parallel and Distributed Systems
Attacking the bottlenecks of backfilling schedulers

Cluster Computing
A Model for Moldable Supercomputer Jobs

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The ANL/IBM SP Scheduling System

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Modeling of Workload in MPPs

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
A Historical Application Profiler for Use by Parallel Schedulers

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Improved Utilization and Responsiveness with Gang Scheduling

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Predicting Application Run Times Using Historical Information

IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Characteristics of a Large Shared Memory Production Workload

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Performance Evaluation with Heavy Tailed Distributions

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
The Impact of More Accurate Requested Runtimes on Production Job Scheduling Performance

JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
A parallel workload model and its implications for processor allocation

HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
Job-Length Estimation and Performance in Backfilling Schedulers

HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
Utilization and Predictability in Scheduling the IBM SP2 with Backfilling

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
The workload on parallel supercomputers: modeling the characteristics of rigid jobs

Journal of Parallel and Distributed Computing
Experimental Analysis of the Root Causes of Performance Evaluation Results: A Backfilling Case Study

IEEE Transactions on Parallel and Distributed Systems
What is worth learning from parallel workloads?: a user and session based analysis

Proceedings of the 19th annual international conference on Supercomputing
Predicting job start times on clusters

CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
A comprehensive model of the supercomputer workload

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop

Backfilling Using System-Generated Predictions Rather than User Runtime Estimates

IEEE Transactions on Parallel and Distributed Systems
Inter-operating grids through Delegated MatchMaking

Scientific Programming - Large-Scale Programming Tools and Environments
On the use of meta-heuristics to increase the efficiency of online grid workflow scheduling algorithms

Cluster Computing
Rescheduling co-allocation requests based on flexible advance reservations and processor remapping

GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Evaluating the impact of inaccurate information in utility-based scheduling

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Adaptive job scheduling via predictive job resource allocation

JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
The Effects of Untruthful Bids on User Utilities and Stability in Computing Markets

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A hybrid Markov chain model for workload on parallel computers

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Instability in parallel job scheduling simulation: the role of workload flurries

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Risk aware overbooking for commercial grids

JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
The importance of complete data sets for job scheduling simulations

JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Using inaccurate estimates accurately

JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Job Allocation Strategies with User Run Time Estimates for Online Scheduling in Hierarchical Grids

Journal of Grid Computing
Reducing electricity cost through virtual machine placement in high performance computing clouds

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Provisioning spot market cloud resources to create cost-effective virtual clusters

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Job failures in high performance computing systems: A large-scale empirical study

Computers & Mathematics with Applications
Pitfalls in parallel job scheduling evaluation

JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
A bio-inspired distributed algorithm to improve scheduling performance of multi-broker grids

Natural Computing: an international journal
Configurable performance analysis and evaluation framework for cloud systems

International Journal of Information and Communication Technology
TLA: Temporal look-ahead processor allocation method for heterogeneous multi-cluster systems

Journal of Parallel and Distributed Computing
List-based Data Structures for Efficient Management of Advance Reservations

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

User estimates of job runtimes have emerged as an important component of the workload on parallel machines, and can have a significant impact on how a scheduler treats different jobs, and thus on overall performance. It is therefore highly desirable to have a good model of the relationship between parallel jobs and their associated estimates. We construct such a model based on a detailed analysis of several workload traces. The model incorporates those features that are consistent in all of the logs, most notably the inherently modal nature of estimates (e.g. only 20 different values are used as estimates for about 90% of the jobs). We find that the behavior of users, as manifested through the estimate distributions, is remarkably similar across the different workload traces. Indeed, providing our model with only the maximal allowed estimate value, along with the percentage of jobs that have used it, yields results that are very similar to the original. The remaining difference (if any) is largely eliminated by providing information on one or two additional popular estimates. Consequently, in comparison to previous models, simulations that utilize our model are better in reproducing scheduling behavior similar to that observed when using real estimates.