Detecting common scientific workflow fragments using templates and execution provenance

Authors:
Daniel Garijo;Oscar Corcho;Yolanda Gil
Affiliations:
Universidad Politécnica de Madrid, Madrid, Spain;Universidad Politécnica de Madrid, Madrid, Spain;University of Southern California, Los Angeles, USA
Venue:
Proceedings of the seventh international conference on Knowledge capture
Year:
2013

Citing 20
Cited 0

Workflow Patterns

Distributed and Parallel Databases
The complexity of theorem-proving procedures

STOC '71 Proceedings of the third annual ACM symposium on Theory of computing
VisTrails: visualization meets data management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Scientific workflow management and the Kepler system: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Workflow discovery: the problem, a case study from e-Science and a graph-based solution

ICWS '06 Proceedings of the IEEE International Conference on Web Services
Problem-Solving Methods for Understanding Process Executions

Computing in Science and Engineering
Towards Case-Based Support for e-Science Workflow Generation by Mining Provenance

ECCBR '08 Proceedings of the 9th European conference on Advances in Case-Based Reasoning
The design and realisation of the Experimentmy Virtual Research Environment for social sharing of workflows

Future Generation Computer Systems
Substructure discovery using minimum description length and background knowledge

Journal of Artificial Intelligence Research
POIROT: integrated learning of web service procedures

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
A context driven approach for workflow mining

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Taverna, reloaded

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Wings: Intelligent Workflow-Based Design of Computational Experiments

IEEE Intelligent Systems
The Open Provenance Model core specification (v1.1)

Future Generation Computer Systems
CrowdLabs: social analysis and visualization for the sciences

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Seven bottlenecks to workflow reuse and repurposing

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
A new approach for publishing workflows: abstractions, standards, and linked data

Proceedings of the 6th workshop on Workflows in support of large-scale science
A Framework for Efficient Data Analytics through Automatic Configuration and Customization of Scientific Workflows

ESCIENCE '11 Proceedings of the 2011 IEEE Seventh International Conference on eScience
Common motifs in scientific workflows: An empirical analysis

E-SCIENCE '12 Proceedings of the 2012 IEEE 8th International Conference on E-Science (e-Science)
Scientific workflow rewriting while preserving provenance

E-SCIENCE '12 Proceedings of the 2012 IEEE 8th International Conference on E-Science (e-Science)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Provenance plays a major role when understanding and reusing the methods applied in a scientific experiment, as it provides a record of inputs, the processes carried out and the use and generation of intermediate and final results. In the specific case of in-silico scientific experiments, a large variety of scientific workflow systems (e.g., Wings, Taverna, Galaxy, Vistrails) have been created to support scientists. All of these systems produce some sort of provenance about the executions of the workflows that encode scientific experiments. However, provenance is normally recorded at a very low level of detail, which complicates the understanding of what happened during execution. In this paper we propose an approach to automatically obtain abstractions from low-level provenance data by finding common workflow fragments on workflow execution provenance and relating them to templates. We have tested our approach with a dataset of workflows published by the Wings workflow system. Our results show that by using these kinds of abstractions we can highlight the most common abstract methods used in the executions of a repository, relating different runs and workflow templates with each other.