Condor: a distributed job scheduler
Beowulf cluster computing with Linux
Supporting Fine-grained Data Lineage in a Database Visualization Environment
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Why and Where: A Characterization of Data Provenance
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Lineage Tracing for General Data Warehouse Transformations
Proceedings of the 27th International Conference on Very Large Data Bases
Tracing Data Lineage Using Automed Schema Transformation Pathways
BNCOD 19 Proceedings of the 19th British National Conference on Databases: Advances in Databases
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Lineage tracing for general data warehouse transformations
The VLDB Journal — The International Journal on Very Large Data Bases
Earth System Science Workbench: A Data Management Infrastructure for Earth Science Products
SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
Lineage retrieval for scientific data processing: a survey
ACM Computing Surveys (CSUR)
A survey of data provenance in e-science
ACM SIGMOD Record
Automatic generation of workflow provenance
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Issues in automatic provenance collection
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Transparently gathering provenance with provenance aware condor
TAPP'09 First workshop on on Theory and practice of provenance
Recording actor state in scientific workflows
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Agent-based accountable grid computing systems
The Journal of Supercomputing
Hi-index | 0.00 |
We examine provenance in the context of a distributed job execution system. It is crucial to capture provenance information during the execution of a job in a distributed environment because often this information is lost once the job has finished. In this paper we discuss the type of information that is available within a distributed job execution system, how to capture such information, and what the burdens on the user and system are when such information is captured. We identify what we think is the key data that must be captured and discuss the collection of provenance in the Quill++ project of Condor. Our conclusion is that it is possible to capture important provenance information in a distributed job execution system with relatively little intrusion on the user or the system.