Future Generation Computer Systems - Special issue on metacomputing
FLASH vs. (Simulated) FLASH: closing the simulation loop
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Predicting Queue Times on Space-Sharing Parallel Computers
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Performance information services for computational Grids
Grid resource management
The Globus Striped GridFTP Framework and Server
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Predicting bounds on queuing delay for batch-scheduled parallel machines
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Scientific workflow management and the Kepler system: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Is 99% utilization of a supercomputer a good thing?
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Workflows for e-Science: Scientific Workflows for Grids
Workflows for e-Science: Scientific Workflows for Grids
Pegasus: A framework for mapping complex scientific workflows onto distributed systems
Scientific Programming
Beyond Performance Tools: Measuring and Modeling Productivity in HPC
SE-HPC '07 Proceedings of the 3rd International Workshop on Software Engineering for High Performance Computing Applications
User-level grid monitoring with Inca 2
Proceedings of the 2007 workshop on Grid monitoring
SimGrid: A Generic Framework for Large-Scale Distributed Experiments
UKSIM '08 Proceedings of the Tenth International Conference on Computer Modeling and Simulation
Wide-area performance profiling of 10GigE and InfiniBand technologies
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Planetary-scale terrain composition
Planetary-scale terrain composition
Hi-index | 0.00 |
Advances in technology and the increasing number and scale of compute resources have enabled larger computational science experiments and given researchers many choices of where and how to store data and perform computation. Analyzing the time to completion of their experiments is important for scientists to make the best use of both human and computational resources, but it is difficult to do in a comprehensive fashion because it involves experiment, system and user variables and their interactions with each configuration of systems. We present a simulation toolkit for analysis of computational science experiments and estimation of their time to completion. Our approach uses a minimal description of the experiment's workflow, and separate information about the systems being evaluated. We evaluate our approach using synthetic experiments that reflect actual workflow patterns, executed on systems from the NSF TeraGrid. Our evaluation focuses on ranking the available systems in order of expected experiment completion time. We show that with sufficient system information, the model can help investigate alternative systems and evaluate workflow bottlenecks. We also discuss the challenges posed by volatile queue wait time behavior, and suggest some methods to improve the accuracy of simulation for near-term workflow executions. We evaluate the impact of advance notice of predictable spikes in queue wait time due to down-time and reservations. We show that given advance notice, the probability of a correct ranking for a sample of synthetic workflows could increase from 59% to 74% or even 79%.