The importance of complete data sets for job scheduling simulations

Authors:
Dalibor Klusáček;Hana Rudová
Affiliations:
Faculty of Informatics, Masaryk University, Brno, Czech Republic;Faculty of Informatics, Masaryk University, Brno, Czech Republic
Venue:
JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Year:
2010

Citing 21
Cited 2

Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling

IEEE Transactions on Parallel and Distributed Systems
Improving cluster availability using workstation validation

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The EASY - LoadLeveler API Project

IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Theory and Practice in Parallel Job Scheduling

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Metrics and Benchmarking for Parallel Job Scheduling

IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Benchmarks and Standards for the Evaluation of Parallel Job Schedulers

IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
On the Design and Evaluation of Job Scheduling Algorithms

IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
Selective Reservation Strategies for Backfill Job Scheduling

JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
Effective Metacomputing using LSF MultiCluster

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
The workload on parallel supercomputers: modeling the characteristics of rigid jobs

Journal of Parallel and Distributed Computing
Failure Data Analysis of a Large-Scale Heterogeneous Server Environment

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Benefits of Global Grid Computing for Job Scheduling

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Experimental Analysis of the Root Causes of Performance Evaluation Results: A Backfilling Case Study

IEEE Transactions on Parallel and Distributed Systems
A large-scale study of failures in high-performance computing systems

DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
The Grid Workloads Archive

Future Generation Computer Systems
A toolkit for modelling and simulating data Grids: an extension to GridSim

Concurrency and Computation: Practice & Experience
On the dynamic resource availability in grids

GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Computational models and heuristic methods for Grid scheduling problems

Future Generation Computer Systems
Alea 2: job scheduling simulator

Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques
Performance implications of failures in large-scale cluster scheduling

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Modeling user runtime estimates

JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing

On Providing Quality of Service in Grid Computing through Multi-objective Swarm-Based Knowledge Acquisition in Fuzzy Schedulers

International Journal of Approximate Reasoning
Fuzzy scheduling with swarm intelligence-based knowledge acquisition for grid computing

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper has been inspired by the study of the complex data set from the Czech National Grid MetaCentrum. Unlike other widely used workloads from Parallel Workloads Archive or Grid Workloads Archive, this data set includes additional information concerning machine failures, job requirements and machine parameters which allows to perform more realistic simulations. We show that large differences in the performance of various scheduling algorithms appear when these additional information are used. Moreover, we studied other publicly available workloads and partially reconstructed information concerning their machine failures and job requirements using statistical and analytical models to demonstrate that similar behavior is also expectable for other workloads. We suggest that additional information about both machines and jobs should be incorporated into the workloads archives to allow proper and more realistic simulations.