Exploring provenance in a distributed job execution system

  • Authors:
  • Christine F. Reilly;Jeffrey F. Naughton

  • Affiliations:
  • Department of Computer Sciences, University of Wisconsin–Madison, Madison, Wisconsin;Department of Computer Sciences, University of Wisconsin–Madison, Madison, Wisconsin

  • Venue:
  • IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We examine provenance in the context of a distributed job execution system. It is crucial to capture provenance information during the execution of a job in a distributed environment because often this information is lost once the job has finished. In this paper we discuss the type of information that is available within a distributed job execution system, how to capture such information, and what the burdens on the user and system are when such information is captured. We identify what we think is the key data that must be captured and discuss the collection of provenance in the Quill++ project of Condor. Our conclusion is that it is possible to capture important provenance information in a distributed job execution system with relatively little intrusion on the user or the system.