Towards automated collection of application-level data provenance

  • Authors:
  • Dawood Tariq;Maisem Ali;Ashish Gehani

  • Affiliations:
  • SRI International;SRI International;SRI International

  • Venue:
  • TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Gathering data provenance at the operating system level is useful for capturing system-wide activity. However, many modern programs are complex and can perform numerous tasks concurrently. Capturing their provenance at this level, where processes are treated as single entities, may lead to the loss of useful intra-process detail. This can, in turn, produce false dependencies in the provenance graph. Using the LLVM compiler framework and SPADE provenance infrastructure, we investigate adding provenance instrumentation to allow intraprocess provenance to be captured automatically. This results in a more accurate representation of the provenance relationships and eliminates some false dependencies. Since the capture of fine-grained provenance incurs increased overhead for storage and querying, we minimize the records retained by allowing users to declare aspects of interest and then automatically infer which provenance records are unnecessary and can be discarded.