Using domain-specific data to enhance scientific workflow steering queries

Authors:
João Carlos de A.R. Gonçalves;Daniel de Oliveira;Kary A. C. S. Ocaña;Eduardo Ogasawara;Marta Mattoso
Affiliations:
COPPE, Federal University of Rio de Janeiro, Brazil;COPPE, Federal University of Rio de Janeiro, Brazil;COPPE, Federal University of Rio de Janeiro, Brazil;COPPE, Federal University of Rio de Janeiro, Brazil, CEFET/RJ, Brazil;COPPE, Federal University of Rio de Janeiro, Brazil
Venue:
IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
Year:
2012

Citing 18
Cited 1

Design patterns: elements of reusable object-oriented software

Design patterns: elements of reusable object-oriented software
A survey of data provenance in e-science

ACM SIGMOD Record
VisTrails: visualization meets data management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A Framework for Collecting Provenance in Data-Centric Scientific Workflows

ICWS '06 Proceedings of the IEEE International Conference on Web Services
Workflows for e-Science: Scientific Workflows for Grids

Workflows for e-Science: Scientific Workflows for Grids
Examining the Challenges of Scientific Workflows

Computer
The myGrid ontology: bioinformatics service discovery

International Journal of Bioinformatics Research and Applications
Provenance for Computational Tasks: A Survey

Computing in Science and Engineering
A Provenance-Based Fault Tolerance Mechanism for Scientific Workflows

Provenance and Annotation of Data and Processes
The Open Provenance Model: An Overview

Provenance and Annotation of Data and Processes
A break in the clouds: towards a cloud definition

ACM SIGCOMM Computer Communication Review
Authenticity and provenance in long term digital preservation: modeling and implementation in preservation aware storage

TAPP'09 First workshop on on Theory and practice of provenance
Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Pipeline-centric provenance model

Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Exploring many task computing in scientific workflows

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Supporting dynamic parameter sweep in adaptive and user-steered workflow

Proceedings of the 6th workshop on Workflows in support of large-scale science
Optimizing Phylogenetic Analysis Using SciHmm Cloud-based Scientific Workflow

ESCIENCE '11 Proceedings of the 2011 IEEE Seventh International Conference on eScience
An adaptive parallel execution strategy for cloud-based scientific workflows

Concurrency and Computation: Practice & Experience

Provenance traces from Chiron parallel workflow engine

Proceedings of the Joint EDBT/ICDT 2013 Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

In scientific workflows, provenance data helps scientists in understanding, evaluating and reproducing their results. Provenance data generated at runtime can also support workflow steering mechanisms. Steering facilities for workflows is considered a challenge due to its dynamic demands during execution. To steer, for example, scientists should be able to suspend (or stop) a workflow execution when the approximate solution meets (or deviates) preset criteria. These criteria are commonly evaluated based on provenance data (execution data) and domain-specific data. We claim that the final decision on whether to interfere on the workflow execution may only become feasible when workflows can be steered by scientists using provenance data enriched with domain-specific data. In this paper we propose an approach based on specialized software components, named Data Extractor (DE), to acquire domain-specific data from data files produced during a scientific workflow execution. DE gathers domain-specific data from produced data files and associates it to existing provenance data on the provenance repository. We have evaluated the proposed approach using a real bioinformatics workflow for comparative genomics executed in SciCumulus cloud workflow parallel engine.