Provenance collection support in the kepler scientific workflow system

Authors:
Ilkay Altintas;Oscar Barney;Efrat Jaeger-Frank
Affiliations:
San Diego Supercomputer Center, University of California, San Diego, CA;Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT;San Diego Supercomputer Center, University of California, San Diego, CA
Venue:
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Year:
2006

Citing 13
Cited 62

Why and Where: A Characterization of Data Provenance

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Data Provenance: Some Basic Issues

FST TCS 2000 Proceedings of the 20th Conference on Foundations of Software Technology and Theoretical Computer Science
Separation of concerns: overhead in modeling and efficient simulation techniques

Proceedings of the 4th ACM international conference on Embedded software
A survey of data provenance in e-science

ACM SIGMOD Record
Managing the Evolution of Dataflows with VisTrails

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Scientific workflow management and the Kepler system: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Programming scientific and distributed workflow with Triana services: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
A framework for the design and reuse of grid workflows

SAG'04 Proceedings of the First international conference on Scientific Applications of Grid Computing
Managing rapidly-evolving scientific workflows

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Mapping physical formats to logical models to extract data and metadata: the defuddle parsing engine

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
A model for user-oriented data provenance in pipelined scientific workflows

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data

Provenance and scientific workflows: challenges and opportunities

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A three tier architecture applied to LiDAR processing and monitoring

Scientific Programming - Scientific Workflows
Examining Statistics of Workflow Evolution Provenance: A First Study

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Experience in using a process language to define scientific workflow and generate dataset provenance

Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
A model of process documentation to determine provenance in mash-ups

ACM Transactions on Internet Technology (TOIT)
Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements

Provenance and Annotation of Data and Processes
A Logic Programming Approach to Scientific Workflow Provenance Querying

Provenance and Annotation of Data and Processes
A Model for Sharing of Confidential Provenance Information in a Query Based System

Provenance and Annotation of Data and Processes
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life

Provenance and Annotation of Data and Processes
Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures

Provenance and Annotation of Data and Processes
A Provenance-Based Fault Tolerance Mechanism for Scientific Workflows

Provenance and Annotation of Data and Processes
Scientific workflow design for mere mortals

Future Generation Computer Systems
Atomicity and provenance support for pipelined scientific workflows

Future Generation Computer Systems
Efficient provenance storage over nested data collections

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Application of named graphs towards custom provenance views

TAPP'09 First workshop on on Theory and practice of provenance
Recording Process Documentation in the Presence of Failures

Methods, Models and Tools for Fault Tolerance
Automated Provenance Collection for CCA Component Assemblies

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Tracking Files in the Kepler Provenance Framework

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Scientific Workflows: Business as Usual?

BPM '09 Proceedings of the 7th International Conference on Business Process Management
Web enabling desktop workflow applications

Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Towards scientific workflow patterns

Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Software traceability with topic modeling

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
RDFProv: A relational RDF store for querying and managing scientific workflow provenance

Data & Knowledge Engineering
On the use of abstract workflows to capture scientific process provenance

TAPP'10 Proceedings of the 2nd conference on Theory and practice of provenance
Layering in provenance systems

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Preserving integrity and confidentiality of a directed acyclic graph model of provenance

DBSec'10 Proceedings of the 24th annual IFIP WG 11.3 working conference on Data and applications security and privacy
Bridging workflow and data provenance using strong links

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
A fault-tolerance architecture for Kepler-based distributed scientific workflows

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
e-BioFlow: improving practical use of workflow systems in bioinformatics

ITBAM'10 Proceedings of the First international conference on Information technology in bio- and medical informatics
On-demand minimum cost benchmarking for intermediate dataset storage in scientific cloud workflow systems

Journal of Parallel and Distributed Computing
The Foundations for Provenance on the Web

Foundations and Trends in Web Science
Retry scopes to enable robust workflow execution in pervasive environments

ICSOC/ServiceWave'09 Proceedings of the 2009 international conference on Service-oriented computing
Monitoring unmanaged business processes

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems - Volume Part I
Representing distributed systems using the Open Provenance Model

Future Generation Computer Systems
Workflows for information integration in the life sciences

Search computing
A scientific workflow environment for Earth system related studies

Computers & Geosciences
Provenance for MapReduce-based data-intensive workflows

Proceedings of the 6th workshop on Workflows in support of large-scale science
Achieving reproducibility by combining provenance with service and workflow versioning

Proceedings of the 6th workshop on Workflows in support of large-scale science
Provenance explorer – customized provenance views using semantic inferencing

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Online workflow management and performance analysis with stampede

Proceedings of the 7th International Conference on Network and Services Management
Provenance implementation in a scientific simulation environment

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Performance evaluation of the karma provenance framework for scientific workflows

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
A data dependency based strategy for intermediate data storage in scientific cloud workflow systems

Concurrency and Computation: Practice & Experience
Challenges and approaches for distributed workflow-driven analysis of large-scale biological data: vision paper

Proceedings of the 2012 Joint EDBT/ICDT Workshops
A general-purpose provenance library

TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
Experiment explorer: lightweight provenance search over metadata

TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
The Geoprocessing Web

Computers & Geosciences
Documenting and sharing scientific research over the semantic web

Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
Nonintrusive collection and management of data provenance in scientific workflows

Concurrency and Computation: Practice & Experience
Toward self-describing and workflow integrated Earth system models: A coupled atmosphere-ocean modeling system application

Environmental Modelling & Software
Declarative rules for inferring fine-grained data provenance from scientific workflow execution traces

IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
Modelling provenance using structured occurrence networks

IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
Towards Next Generation Provenance Systems for e-Science

International Journal of Information System Modeling and Design
A system for managing data provenance in in silico experiments

ADC '11 Proceedings of the Twenty-Second Australasian Database Conference - Volume 115
IPAPI: designing an improved provenance API

TaPP'13 Proceedings of the 5th USENIX conference on Theory and Practice of Provenance
IPAPI: designing an improved provenance API

Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance
Towards semantic comparison of multi-granularity process traces

Knowledge-Based Systems
Semantics and provenance for processing element composition in dispel workflows

WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
Automated data provenance capture in spreadsheets, with case studies

Future Generation Computer Systems
Architecture design of a user-orientated electronic laboratory notebook: A case study within an atmospheric chemistry community

Future Generation Computer Systems
Model-as-you-go: An Approach for an Advanced Infrastructure for Scientific Workflows

Journal of Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many data-driven applications, analysis needs to be performed on scientific information obtained from several sources and generated by computations on distributed resources. Systematic analysis of this scientific information unleashes a growing need for automated data-driven applications that also can keep track of the provenance of the data and processes with little user interaction and overhead. Such data analysis can be facilitated by the recent advancements in scientific workflow systems. A major profit when using scientific workflow systems is the ability to make provenance collection a part of the workflow. Specifically, provenance should include not only the standard data lineage information but also information about the context in which the workflow was used, execution that processed the data, and the evolution of the workflow design. In this paper we describe a complete framework for data and process provenance in the Kepler Scientific Workflow System. We outline the requirements and issues related to data and workflow provenance in a multi-disciplinary workflow system and introduce how generic provenance capture can be facilitated in Kepler's actor-oriented workflow environment. We also describe the usage of the stored provenance information for efficient rerun of scientific workflows.