Metrics for heterogeneous scientific workflows: A case study of an earthquake science application

Authors:
Scott Callaghan;Philip Maechling;Patrick Small;Kevin Milner;Gideon Juve;Thomas H Jordan;Ewa Deelman;Gaurang Mehta;Karan Vahi;Dan Gunter;Keith Beattie;Christopher Brooks
Affiliations:
University of Southern California, USA;University of Southern California, USA;University of Southern California, USA;University of Southern California, USA;University of Southern California, USA;University of Southern California, USA;USC Information Sciences Institute, USA;USC Information Sciences Institute, USA;USC Information Sciences Institute, USA;Lawrence Berkeley National Laboratory, USA;Lawrence Berkeley National Laboratory, USA;University of San Francisco, USA
Venue:
International Journal of High Performance Computing Applications
Year:
2011

Citing 7
Cited 6

Matchmaking: Distributed Resource Management for High Throughput Computing

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Pegasus: A framework for mapping complex scientific workflows onto distributed systems

Scientific Programming
Overhead Analysis of Scientific Workflows in Grid Environments

IEEE Transactions on Parallel and Distributed Systems
Reducing Time-to-Solution Using Distributed High-Throughput Mega-Workflows - Experiences from SCEC CyberShake

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Overhead Analysis of Grid Workflow Applications

GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
A performance study of grid workflow engines

GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Scaling up workflow-based applications

Journal of Computer and System Sciences

Enabling large-scale scientific workflows on petascale resources using MPI master/worker

Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Characterizing and profiling scientific workflows

Future Generation Computer Systems
Federating queries in SPARQL 1.1: Syntax, semantics and evaluation

Web Semantics: Science, Services and Agents on the World Wide Web
Evaluating parameter sweep workflows in high performance computing

Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
A Case Study into Using Common Real-Time Workflow Monitoring Infrastructure for Scientific Workflows

Journal of Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific workflows are a common computational model for performing scientific simulations. They may include many jobs, many scientific codes, and many file dependencies. Since scientific workflow applications may include both high-performance computing (HPC) and high-throughput computing (HTC) jobs, meaningful performance metrics are difficult to define, as neither traditional HPC metrics nor HTC metrics fully capture the extent of the application. We describe and propose the use of alternative metrics to accurately capture the scale of scientific workflows and quantify their efficiency. In this paper, we present several specific practical scientific workflow performance metrics and discuss these metrics in the context of a large-scale scientific workflow application, the Southern California Earthquake Center CyberShake 1.0 Map calculation. Our metrics reflect both computational performance, such as floating-point operations and file access, and workflow performance, such as job and task scheduling and execution. We break down performance into three levels of granularity: the task, the workflow, and the application levels, presenting a complete view of application performance. We show how our proposed metrics can be used to compare multiple invocations of the same application, as well as executions of heterogeneous applications, quantifying the amount of work performed and the efficiency of the work. Finally, we analyze CyberShake using our proposed metrics to determine potential application optimizations.