Scaling up workflow-based applications

Authors:
Scott Callaghan;Ewa Deelman;Dan Gunter;Gideon Juve;Philip Maechling;Christopher Brooks;Karan Vahi;Kevin Milner;Robert Graves;Edward Field;David Okaya;Thomas Jordan
Affiliations:
University of Southern California, Los Angeles, CA 90089, United States;USC Information Sciences Institute, Marina Del Rey, CA 90292, United States;Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States;University of Southern California, Los Angeles, CA 90089, United States;University of Southern California, Los Angeles, CA 90089, United States;University of San Francisco, CA 94117, United States;USC Information Sciences Institute, Marina Del Rey, CA 90292, United States;University of Southern California, Los Angeles, CA 90089, United States;URS Corporation, Pasadena, CA 91101, United States;US Geological Survey, Pasadena, CA 91106, United States;University of Southern California, Los Angeles, CA 90089, United States;University of Southern California, Los Angeles, CA 90089, United States
Venue:
Journal of Computer and System Sciences
Year:
2010

Citing 16
Cited 9

Giggle: a framework for constructing scalable replica location services

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Security for Grid Services

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
NetLogger: A Toolkit for Distributed System Performance Analysis

MASCOTS '00 Proceedings of the 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Distributed P2P Computing within Triana: A Galaxy Visualization Test Case

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Scheduling of scientific workflows in the ASKALON grid environment

ACM SIGMOD Record
VisTrails: visualization meets data management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Programming scientific and distributed workflow with Triana services: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example

E-SCIENCE '06 Proceedings of the Second IEEE International Conference on e-Science and Grid Computing
Pegasus: A framework for mapping complex scientific workflows onto distributed systems

Scientific Programming
Examining the Challenges of Scientific Workflows

Computer
Workflow task clustering for best effort systems with Pegasus

Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
Reducing Time-to-Solution Using Distributed High-Throughput Mega-Workflows - Experiences from SCEC CyberShake

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Resource Provisioning Options for Large-Scale Scientific Workflows

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Lattice QCD Workflows: A Case Study

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Workflows and e-Science: An overview of workflow system features and capabilities

Future Generation Computer Systems
Data integration and workflow solutions for ecology

DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences

Towards optimising distributed data streaming graphs using parallel streams

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Data-intensive CyberShake computations on an opportunistic cyberinfrastructure

Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
Metrics for heterogeneous scientific workflows: A case study of an earthquake science application

International Journal of High Performance Computing Applications
Performance Evaluation of Overload Control in Multi-cluster Grids

GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
An Evaluation of the Cost and Performance of Scientific Workflows on Amazon EC2

Journal of Grid Computing
Job and data clustering for aggregate use of multiple production cyberinfrastructures

Proceedings of the fifth international workshop on Data-Intensive Distributed Computing Date
Cloud computing for fast prediction of chemical activity

Future Generation Computer Systems
Physics-based seismic hazard analysis on petascale heterogeneous supercomputers

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A Case Study into Using Common Real-Time Workflow Monitoring Infrastructure for Scientific Workflows

Journal of Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific applications, often expressed as workflows are making use of large-scale national cyberinfrastructure to explore the behavior of systems, search for phenomena in large-scale data, and to conduct many other scientific endeavors. As the complexity of the systems being studied grows and as the data set sizes increase, the scale of the computational workflows increases as well. In some cases, workflows now have hundreds of thousands of individual tasks. Managing such scale is difficult from the point of view of workflow description, execution, and analysis. In this paper, we describe the challenges faced by workflow management and performance analysis systems when dealing with an earthquake science application, CyberShake, executing on the TeraGrid. The scientific goal of the SCEC CyberShake project is to calculate probabilistic seismic hazard curves for sites in Southern California. For each site of interest, the CyberShake platform includes two large-scale MPI calculations and approximately 840,000 embarrassingly parallel post-processing jobs. In this paper, we show how we approach the scalability challenges in our workflow management and log mining systems.