Reducing Time-to-Solution Using Distributed High-Throughput Mega-Workflows - Experiences from SCEC CyberShake

  • Authors:
  • Scott Callaghan;Philip Maechling;Ewa Deelman;Karan Vahi;Gaurang Mehta;Gideon Juve;Kevin Milner;Robert Graves;Edward Field;David Okaya;Dan Gunter;Keith Beattie;Thomas Jordan

  • Affiliations:
  • -;-;-;-;-;-;-;-;-;-;-;-;-

  • Venue:
  • ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Researchers at the Southern California Earthquake Center (SCEC) use large-scale grid-based scientific workflows to perform seismic hazard research as a part of SCEC's program of earthquake system science research. The scientific goal of the SCEC CyberShake project is to calculate probabilistic seismic hazard curves for sites in Southern California. For each site of interest, the CyberShake platform includes two large-scale MPI calculations and approximately 840,000 embarrassingly parallel post-processing jobs. In this paper, we describe the computational requirements of CyberShake and detail how we meet these requirements using grid-based, high-throughput, scientific workflow tools. We describe the specific challenges we encountered and we discuss workflow throughput optimizations we developed that reduced our time to solution by a factor of three and we present runtime statistics and propose further optimizations.