A Case Study into Using Common Real-Time Workflow Monitoring Infrastructure for Scientific Workflows

Authors:
Karan Vahi;Ian Harvey;Taghrid Samak;Daniel Gunter;Kieran Evans;David Rogers;Ian Taylor;Monte Goode;Fabio Silva;Eddie Al-Shakarchi;Gaurang Mehta;Ewa Deelman;Andrew Jones
Affiliations:
USC Information Sciences Institute, Marina Del Rey, USA;School of Computer Science, Cardiff, UK;Lawrence Berkeley National Laboratory, Berkeley, USA;Lawrence Berkeley National Laboratory, Berkeley, USA;School of Computer Science, Cardiff, UK;School of Computer Science, Cardiff, UK;School of Computer Science, Cardiff, UK;Lawrence Berkeley National Laboratory, Berkeley, USA;University of Southern California, Los Angeles, USA;School of Computer Science, Cardiff, UK;USC Information Sciences Institute, Marina Del Rey, USA;USC Information Sciences Institute, Marina Del Rey, USA;School of Computer Science, Cardiff, UK
Venue:
Journal of Grid Computing
Year:
2013

Citing 20
Cited 0

Condor-G: A Computation Management Agent for Multi-Institutional Grids

HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Kepler: An Extensible System for Design and Execution of Scientific Workflows

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Web Services Composition for Distributed Data Mining

ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
A Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid

ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
Taverna: a tool for the composition and enactment of bioinformatics workflows

Bioinformatics
Triana Generations

E-SCIENCE '06 Proceedings of the Second IEEE International Conference on e-Science and Grid Computing
Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example

E-SCIENCE '06 Proceedings of the Second IEEE International Conference on e-Science and Grid Computing
Pegasus: A framework for mapping complex scientific workflows onto distributed systems

Scientific Programming
ASKALON: A Grid Application Development and Computing Environment

GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
WS-RF Workflow in Triana

International Journal of High Performance Computing Applications
Flexible and Efficient Workflow Deployment of Data-Intensive Applications On Grids With MOTEUR

International Journal of High Performance Computing Applications
The Trident Scientific Workflow Workbench

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Workflows and e-Science: An overview of workflow system features and capabilities

Future Generation Computer Systems
Scaling up workflow-based applications

Journal of Computer and System Sciences
P-GRADE portal family for grid infrastructures

Concurrency and Computation: Practice & Experience
Metrics for heterogeneous scientific workflows: A case study of an earthquake science application

International Journal of High Performance Computing Applications
Online Fault and Anomaly Detection for Large-Scale Scientific Workflows

HPCC '11 Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications
Failure prediction and localization in large scientific workflows

Proceedings of the 6th workshop on Workflows in support of large-scale science
Online workflow management and performance analysis with stampede

Proceedings of the 7th International Conference on Network and Services Management
A General Approach to Real-Time Workflow Monitoring

SCC '12 Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific workflow systems support various workflow representations, operational modes, and configurations. Regardless of the system used, end users have common needs: to track the status of their workflows in real time, be notified of execution anomalies and failures automatically, perform troubleshooting, and automate the analysis of the workflow results. In this paper, we describe how the Stampede monitoring infrastructure was integrated with the Pegasus Workflow Management System and the Triana Workflow Systems, in order to add generic real time monitoring and troubleshooting capabilities across both systems. Stampede is an infrastructure that provides interoperable monitoring using a three-layer model: (1) a common data model to describe workflow and job executions; (2) high-performance tools to load workflow logs conforming to the data model into a data store; and (3) a common query interface. This paper describes the integration of Stampede monitoring architecture with Pegasus and Triana and shows the new analysis capabilities that Stampede provides to these workflow systems. The successful integration of Stampede with these workflow engines demonstrates the generic nature of the Stampede monitoring infrastructure and its potential to provide a common platform for monitoring across scientific workflow engines.