Giggle: a framework for constructing scalable replica location services
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
NetLogger: A Toolkit for Distributed System Performance Analysis
MASCOTS '00 Proceedings of the 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Distributed P2P Computing within Triana: A Galaxy Visualization Test Case
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Scheduling of scientific workflows in the ASKALON grid environment
ACM SIGMOD Record
VisTrails: visualization meets data management
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Programming scientific and distributed workflow with Triana services: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
E-SCIENCE '06 Proceedings of the Second IEEE International Conference on e-Science and Grid Computing
Pegasus: A framework for mapping complex scientific workflows onto distributed systems
Scientific Programming
Workflow task clustering for best effort systems with Pegasus
Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Resource Provisioning Options for Large-Scale Scientific Workflows
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Lattice QCD Workflows: A Case Study
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Workflows and e-Science: An overview of workflow system features and capabilities
Future Generation Computer Systems
Data integration and workflow solutions for ecology
DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
Towards optimising distributed data streaming graphs using parallel streams
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Data-intensive CyberShake computations on an opportunistic cyberinfrastructure
Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
Metrics for heterogeneous scientific workflows: A case study of an earthquake science application
International Journal of High Performance Computing Applications
Performance Evaluation of Overload Control in Multi-cluster Grids
GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
An Evaluation of the Cost and Performance of Scientific Workflows on Amazon EC2
Journal of Grid Computing
Job and data clustering for aggregate use of multiple production cyberinfrastructures
Proceedings of the fifth international workshop on Data-Intensive Distributed Computing Date
Cloud computing for fast prediction of chemical activity
Future Generation Computer Systems
Physics-based seismic hazard analysis on petascale heterogeneous supercomputers
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A Case Study into Using Common Real-Time Workflow Monitoring Infrastructure for Scientific Workflows
Journal of Grid Computing
Hi-index | 0.00 |
Scientific applications, often expressed as workflows are making use of large-scale national cyberinfrastructure to explore the behavior of systems, search for phenomena in large-scale data, and to conduct many other scientific endeavors. As the complexity of the systems being studied grows and as the data set sizes increase, the scale of the computational workflows increases as well. In some cases, workflows now have hundreds of thousands of individual tasks. Managing such scale is difficult from the point of view of workflow description, execution, and analysis. In this paper, we describe the challenges faced by workflow management and performance analysis systems when dealing with an earthquake science application, CyberShake, executing on the TeraGrid. The scientific goal of the SCEC CyberShake project is to calculate probabilistic seismic hazard curves for sites in Southern California. For each site of interest, the CyberShake platform includes two large-scale MPI calculations and approximately 840,000 embarrassingly parallel post-processing jobs. In this paper, we show how we approach the scalability challenges in our workflow management and log mining systems.