Exploring provenance in a distributed job execution system

Authors:
Christine F. Reilly;Jeffrey F. Naughton
Affiliations:
Department of Computer Sciences, University of Wisconsin–Madison, Madison, Wisconsin;Department of Computer Sciences, University of Wisconsin–Madison, Madison, Wisconsin
Venue:
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Year:
2006

Citing 12
Cited 3

Condor: a distributed job scheduler

Beowulf cluster computing with Linux
Supporting Fine-grained Data Lineage in a Database Visualization Environment

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Why and Where: A Characterization of Data Provenance

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Lineage Tracing for General Data Warehouse Transformations

Proceedings of the 27th International Conference on Very Large Data Bases
Tracing Data Lineage Using Automed Schema Transformation Pathways

BNCOD 19 Proceedings of the 19th British National Conference on Databases: Advances in Databases
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Lineage tracing for general data warehouse transformations

The VLDB Journal — The International Journal on Very Large Data Bases
Earth System Science Workbench: A Data Management Infrastructure for Earth Science Products

SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
Lineage retrieval for scientific data processing: a survey

ACM Computing Surveys (CSUR)
A survey of data provenance in e-science

ACM SIGMOD Record
Automatic generation of workflow provenance

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Issues in automatic provenance collection

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data

Transparently gathering provenance with provenance aware condor

TAPP'09 First workshop on on Theory and practice of provenance
Recording actor state in scientific workflows

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Agent-based accountable grid computing systems

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We examine provenance in the context of a distributed job execution system. It is crucial to capture provenance information during the execution of a job in a distributed environment because often this information is lost once the job has finished. In this paper we discuss the type of information that is available within a distributed job execution system, how to capture such information, and what the burdens on the user and system are when such information is captured. We identify what we think is the key data that must be captured and discuss the collection of provenance in the Quill++ project of Condor. Our conclusion is that it is possible to capture important provenance information in a distributed job execution system with relatively little intrusion on the user or the system.