Provenance from log files: a BigData problem

  • Authors:
  • Devarshi Ghoshal;Beth Plale

  • Affiliations:
  • Indiana University, Bloomington, IN;Indiana University, Bloomington, IN

  • Venue:
  • Proceedings of the Joint EDBT/ICDT 2013 Workshops
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

As new data products of research increasingly become the product or output of complex processes, the lineage of the resulting products takes on greater importance as a description of the processes that contributed to the result. Without adequate description of data products, their reuse is lessened. The act of instrumenting an application for provenance capture is burdensome, however. This paper explores the option of deriving provenance from existing log files, an approach that reduces the instrumentation task substantially but raises questions about sifting through huge amounts of information for what may or may not be complete provenance. In this paper we study the tradeoff of ease of capture and provenance completeness, and show that under some circumstances capture through logs can result in high quality provenance.