Workload characteristics of a multi-cluster supercomputer

Authors:
Hui Li;David Groep;Lex Wolters
Affiliations:
Leiden Institute of Advanced Computer Science (LIACS), Leiden University, The Netherlands;National Institute for Nuclear and High Energy Physics (NIKHEF), The Netherlands;Leiden Institute of Advanced Computer Science (LIACS), Leiden University, The Netherlands
Venue:
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Year:
2004

Citing 14
Cited 36

Construction and use of multiclass workload models

Performance Evaluation
The elusive goal of workload characterization

ACM SIGMETRICS Performance Evaluation Review
Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Memory Usage in the LANL CM-5 Workload

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Modeling of Workload in MPPs

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Using Queue Time Predictions for Processor Allocation

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Benchmarks and Standards for the Evaluation of Parallel Job Schedulers

IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
Characteristics of a Large Shared Memory Production Workload

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Workload Modeling for Performance Evaluation

Performance Evaluation of Complex Systems: Techniques and Tools, Performance 2002, Tutorial Lectures
Fitting world-wide web request traces with the EM-algorithm

Performance Evaluation - Special issue: Internet performance and control of network systems
A Comparison of Workload Traces from Two Production Parallel Machines

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
The workload on parallel supercomputers: modeling the characteristics of rigid jobs

Journal of Parallel and Distributed Computing
Probability, Statistics, and Queueing Theory with Computer Science Applications

Probability, Statistics, and Queueing Theory with Computer Science Applications
A comprehensive model of the supercomputer workload

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop

Brokering strategies in computational grids using stochastic prediction models

Parallel Computing
Analysis and modeling of job arrivals in a production grid

ACM SIGMETRICS Performance Evaluation Review
Modeling correlated workloads by combining model based clustering and a localized sampling algorithm

Proceedings of the 21st annual international conference on Supercomputing
Definition, modelling and simulation of a grid computing scheduling system for high throughput computing

Future Generation Computer Systems
Strategies to create platforms for differentiated services from dedicated and opportunistic resources

Journal of Parallel and Distributed Computing
Workload dynamics on clusters and grids

The Journal of Supercomputing
How are Real Grids Used? The Analysis of Four Grid Traces and Its Implications

GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
Modeling user submission strategies on production grids

Proceedings of the 18th ACM international symposium on High performance distributed computing
A Process Scheduling Analysis Model Based on Grid Environment

ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
Modeling the latency on production grids with respect to the execution context

Parallel Computing
An Aspect-Oriented Approach for Disaster Prevention Simulation Workflows on Supercomputers, Clusters, and Grids

DS-RT '09 Proceedings of the 2009 13th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications
Volunteer computing on clusters

JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Modeling job arrivals in a data-intensive grid

JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
On grid performance evaluation using synthetic workloads

JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Grid brokering for batch allocation using indexes

NET-COOP'07 Proceedings of the 1st EuroFGI international conference on Network control and optimization
Multi-facet approach to reduce energy consumption in clouds and grids: the GREEN-NET framework

Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking
Group-wise performance evaluation of processor co-allocation in multi-cluster systems

JSSPP'07 Proceedings of the 13th international conference on Job scheduling strategies for parallel processing
A Realistic Integrated Model of Parallel System Workloads

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
An Analysis of Traces from a Production MapReduce Cluster

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Analysis of scheduling policies under correlated job sizes

Performance Evaluation
Queue waiting time aware dynamic workflow scheduling in multicluster environments

Journal of Computer Science and Technology
Towards a profound analysis of bags-of-tasks in parallel systems and their performance impact

Proceedings of the 20th international symposium on High performance distributed computing
Supporting malleability in parallel architectures with dynamic CPUSETs mapping and dynamic MPI

ICDCN'10 Proceedings of the 11th international conference on Distributed computing and networking
Flexible resource allocation for reliable virtual cluster computing systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Performance analysis of preemption-aware scheduling in multi-cluster grid environments

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Advanced reservation-based scheduling of task graphs on clusters

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
QoS and preemption aware scheduling in federated and virtualized Grid computing environments

Journal of Parallel and Distributed Computing
Dynamic grid load sharing with adaptive dissemination protocols

The Journal of Supercomputing
Workload analysis of a cluster in a grid environment

JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Pitfalls in parallel job scheduling evaluation

JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Failure-aware resource provisioning for hybrid Cloud infrastructure

Journal of Parallel and Distributed Computing
Enhancing performance of failure-prone clusters by adaptive provisioning of cloud resources

The Journal of Supercomputing
Double auction-inspired meta-scheduling of parallel applications on global grids

Journal of Parallel and Distributed Computing
MIP model scheduling for multi-clusters

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
State-based predictions with self-correction on Enterprise Desktop Grid environments

Journal of Parallel and Distributed Computing
The Failure Trace Archive: Enabling the comparison of failure measurements and models of distributed systems

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a comprehensive characterization of a multi-cluster supercomputer workload using twelve-month scientific research traces. Metrics that we characterize include system utilization, job arrival rate and interarrival time, job cancellation rate, job size (degree of parallelism), job runtime, memory usage, and user/group behavior. Correlations between metrics (job runtime and memory usage, requested and actual runtime, etc) are identified and extensively studied. Differences with previously reported workloads are recognized and statistical distributions are fitted for generating synthetic workloads with the same characteristics. This study provides a realistic basis for experiments in resource management and evaluations of different scheduling strategies in a multi-cluster research environment.