Enhancing and abstracting scientific workflow provenance for data publishing

Authors:
Pinar Alper;Khalid Belhajjame;Carole A. Goble;Pinar Karagoz
Affiliations:
University of Manchester, Manchester, UK;University of Manchester, Manchester, UK;University of Manchester, Manchester, UK;Middle East Technical University, Ankara, Turkey
Venue:
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Year:
2013

Citing 13
Cited 1

An annotation management system for relational databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Provenance trails in the Wings-Pegasus system

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Tackling the Provenance Challenge one layer at a time

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Provenance and scientific workflows: challenges and opportunities

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Got data?: a guide to data preservation in the information age

Communications of the ACM - Surviving the data deluge
Workflows and e-Science: An overview of workflow system features and capabilities

Future Generation Computer Systems
Querying and Managing Provenance through User Views in Scientific Workflows

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Provenance in Databases: Why, How, and Where

Foundations and Trends in Databases
Taverna, reloaded

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
PROPUB: towards a declarative approach for publishing customized, policy-aware provenance

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Putting lipstick on pig: enabling database-style workflow provenance

Proceedings of the VLDB Endowment
HELIO: Discovery and Analysis of Data in Heliophysics

ESCIENCE '11 Proceedings of the 2011 IEEE Seventh International Conference on eScience
Common motifs in scientific workflows: An empirical analysis

E-SCIENCE '12 Proceedings of the 2012 IEEE 8th International Conference on E-Science (e-Science)

Static compiler analysis for workflow provenance

WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many scientists are using workflows to systematically design and run computational experiments. Once the workflow is executed, the scientist may want to publish the dataset generated as a result, to be, e.g., reused by other scientists as input to their experiments. In doing so, the scientist needs to curate such dataset by specifying metadata information that describes it, e.g. its derivation history, origins and ownership. To assist the scientist in this task, we explore in this paper the use of provenance traces collected by workflow management systems when enacting workflows. Specifically, we identify the shortcomings of such raw provenance traces in supporting the data publishing task, and propose an approach whereby distilled, yet more informative, provenance traces that are fit for the data publishing task can be derived.