A model for user-oriented data provenance in pipelined scientific workflows

Authors:
Shawn Bowers;Timothy McPhillips;Bertram Ludäscher;Shirley Cohen;Susan B. Davidson
Affiliations:
UC Davis Genome Center, University of California, Davis;UC Davis Genome Center, University of California, Davis;UC Davis Genome Center, University of California, Davis;Computer and Information Science, University of Pennsylvania;Computer and Information Science, University of Pennsylvania
Venue:
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Year:
2006

Citing 13
Cited 33

Static scheduling of synchronous data flow programs for digital signal processing

IEEE Transactions on Computers
Tracing the lineage of view data in a warehousing environment

ACM Transactions on Database Systems (TODS)
Why and Where: A Characterization of Data Provenance

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Lineage Tracing for General Data Warehouse Transformations

Proceedings of the 27th International Conference on Very Large Data Bases
Lineage retrieval for scientific data processing: a survey

ACM Computing Surveys (CSUR)
Distributed computing in practice: the Condor experience: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
A survey of data provenance in e-science

ACM SIGMOD Record
Taverna: a tool for the composition and enactment of bioinformatics workflows

Bioinformatics
Managing the Evolution of Dataflows with VisTrails

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Actor-oriented models for codesign: balancing re-use and performance

Formal methods and models for system design
Scientific workflow management and the Kepler system: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
An annotation management system for relational databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Provenance-based validation of e-science experiments

ISWC'05 Proceedings of the 4th international conference on The Semantic Web

Provenance in databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Automatic annotation of Web services based on workflow definitions

ACM Transactions on the Web (TWEB)
OrthoSearch: a scientific workflow approach to detect distant homologies on protozoans

Proceedings of the 2008 ACM symposium on Applied computing
Curated databases

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Flexible and Efficient Workflow Deployment of Data-Intensive Applications On Grids With MOTEUR

International Journal of High Performance Computing Applications
A Dataflow-Oriented Atomicity and Provenance System for Pipelined Scientific Workflows

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
Provenance Querying for End-Users: A Drug Resistance Case Study

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
A Model for Sharing of Confidential Provenance Information in a Query Based System

Provenance and Annotation of Data and Processes
Atomicity and provenance support for pipelined scientific workflows

Future Generation Computer Systems
Efficient provenance storage over nested data collections

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Tracking Files in the Kepler Provenance Framework

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Grouping Provenance Information to Improve Efficiency of Access Control

ISA '09 Proceedings of the 3rd International Conference and Workshops on Advances in Information Security and Assurance
Scientific Workflows: Business as Usual?

BPM '09 Proceedings of the 7th International Conference on Business Process Management
A navigation model for exploring scientific workflow provenance graphs

Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
A characterization of the problem of secure provenancemanagement

ISI'09 Proceedings of the 2009 IEEE international conference on Intelligence and security informatics
Techniques for efficiently querying scientific workflow provenance graphs

Proceedings of the 13th International Conference on Extending Database Technology
Project histories: managing data provenance across collection-oriented scientific workflow runs

DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Semantic annotation of maps through knowledge provenance

GeoS'07 Proceedings of the 2nd international conference on GeoSpatial semantics
A study on a lightweight scientific workflow system for astronomical e-science service

IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Detecting distant homologies on protozoans metabolic pathways using scientific workflows

International Journal of Data Mining and Bioinformatics
RDFProv: A relational RDF store for querying and managing scientific workflow provenance

Data & Knowledge Engineering
Preserving integrity and confidentiality of a directed acyclic graph model of provenance

DBSec'10 Proceedings of the 24th annual IFIP WG 11.3 working conference on Data and applications security and privacy
The Foundations for Provenance on the Web

Foundations and Trends in Web Science
A scientific workflow environment for Earth system related studies

Computers & Geosciences
PrIMe: A methodology for developing provenance-aware applications

ACM Transactions on Software Engineering and Methodology (TOSEM)
Improving workflow fault tolerance through provenance-based recovery

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Towards a model of provenance and user views in scientific workflows

DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
Provenance collection support in the kepler scientific workflow system

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Tracing where and who provenance in Linked Data: A calculus

Theoretical Computer Science
Toward self-describing and workflow integrated Earth system models: A coupled atmosphere-ocean modeling system application

Environmental Modelling & Software
A survey of pipelined workflow scheduling: Models and algorithms

ACM Computing Surveys (CSUR)
The providence of provenance

BNCOD'13 Proceedings of the 29th British National conference on Big Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Integrated provenance support promises to be a chief advantage of scientific workflow systems over script-based alternatives. While it is often recognized that information gathered during scientific workflow execution can be used automatically to increase fault tolerance (via checkpointing) and to optimize performance (by reusing intermediate data products in future runs), it is perhaps more significant that provenance information may also be used by scientists to reproduce results from earlier runs, to explain unexpected results, and to prepare results for publication. Current workflow systems offer little or no direct support for these “scientist-oriented” queries of provenance information. Indeed the use of advanced execution models in scientific workflows (e.g. process networks, which exhibit pipeline parallelism over streaming data) and failure to record certain fundamental events such as state resets of processes, can render existing provenance schemas useless for scientific applications of provenance. We develop a simple provenance model that is capable of supporting a wide range of scientific use cases even for complex models of computation such as process networks. Our approach reduces these use cases to database queries over event logs, and is capable of reconstructing complete data and invocation dependency graphs for a workflow run.