A simulation toolkit to investigate the effects of grid characteristics on workflow completion time

Authors:
Michael O. McCracken;Allan Snavely
Affiliations:
University of California, San Diego;University of California, San Diego
Venue:
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Year:
2009

Citing 16
Cited 0

The network weather service: a distributed resource performance forecasting service for metacomputing

Future Generation Computer Systems - Special issue on metacomputing
FLASH vs. (Simulated) FLASH: closing the simulation loop

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Predicting Queue Times on Space-Sharing Parallel Computers

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Performance information services for computational Grids

Grid resource management
The Globus Striped GridFTP Framework and Server

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Predicting bounds on queuing delay for batch-scheduled parallel machines

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Scientific workflow management and the Kepler system: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Is 99% utilization of a supercomputer a good thing?

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Evaluation of a workflow scheduler using integrated performance modelling and batch queue wait time prediction

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Workflows for e-Science: Scientific Workflows for Grids

Workflows for e-Science: Scientific Workflows for Grids
Pegasus: A framework for mapping complex scientific workflows onto distributed systems

Scientific Programming
Beyond Performance Tools: Measuring and Modeling Productivity in HPC

SE-HPC '07 Proceedings of the 3rd International Workshop on Software Engineering for High Performance Computing Applications
User-level grid monitoring with Inca 2

Proceedings of the 2007 workshop on Grid monitoring
SimGrid: A Generic Framework for Large-Scale Distributed Experiments

UKSIM '08 Proceedings of the Tenth International Conference on Computer Modeling and Simulation
Wide-area performance profiling of 10GigE and InfiniBand technologies

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Planetary-scale terrain composition

Planetary-scale terrain composition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Advances in technology and the increasing number and scale of compute resources have enabled larger computational science experiments and given researchers many choices of where and how to store data and perform computation. Analyzing the time to completion of their experiments is important for scientists to make the best use of both human and computational resources, but it is difficult to do in a comprehensive fashion because it involves experiment, system and user variables and their interactions with each configuration of systems. We present a simulation toolkit for analysis of computational science experiments and estimation of their time to completion. Our approach uses a minimal description of the experiment's workflow, and separate information about the systems being evaluated. We evaluate our approach using synthetic experiments that reflect actual workflow patterns, executed on systems from the NSF TeraGrid. Our evaluation focuses on ranking the available systems in order of expected experiment completion time. We show that with sufficient system information, the model can help investigate alternative systems and evaluate workflow bottlenecks. We also discuss the challenges posed by volatile queue wait time behavior, and suggest some methods to improve the accuracy of simulation for near-term workflow executions. We evaluate the impact of advance notice of predictable spikes in queue wait time due to down-time and reservations. We show that given advance notice, the probability of a correct ranking for a sample of synthetic workflows could increase from 59% to 74% or even 79%.