SPADE: support for provenance auditing in distributed environments

Authors:
Ashish Gehani;Dawood Tariq
Affiliations:
SRI International;SRI International
Venue:
Proceedings of the 13th International Middleware Conference
Year:
2012

Citing 28
Cited 5

Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Earth System Science Workbench: A Data Management Infrastructure for Earth Science Products

SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
Lineage retrieval for scientific data processing: a survey

ACM Computing Surveys (CSUR)
VisTrails: visualization meets data management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Provenance semirings

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Provenance-aware storage systems

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Provenance for Visualizations: Reproducibility and Beyond

Computing in Science and Engineering
An annotation management system for relational databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Connecting Scientific Data to Scientific Experiments with Provenance

E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
Automatic capture and reconstruction of computational provenance

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Efficient lineage tracking for scientific workflows

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Steps toward managing lineage metadata in grid clusters

TAPP'09 First workshop on on Theory and practice of provenance
Making a cloud provenance-aware

TAPP'09 First workshop on on Theory and practice of provenance
Perm: Processing Provenance and Data on the Same Data Model through Query Rewriting

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
On the Efficiency of Provenance Queries

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Performance and extension of user space file systems

Proceedings of the 2010 ACM Symposium on Applied Computing
Efficient querying and maintenance of network provenance at internet-scale

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Distributed Storage and Querying Techniques for a Semantic Web of Scientific Workflow Provenance

SCC '10 Proceedings of the 2010 IEEE International Conference on Services Computing
Mendel: efficiently verifying the lineage of data modified in multiple trust domains

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Provenance for the cloud

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Layering in provenance systems

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
The Open Provenance Model core specification (v1.1)

Future Generation Computer Systems
Representing distributed systems using the Open Provenance Model

Future Generation Computer Systems
Policy-Based Integration of Provenance Metadata

POLICY '11 Proceedings of the 2011 IEEE International Symposium on Policies for Distributed Systems and Networks
Contextualised workflow execution in mygrid

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
A general-purpose provenance library

TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
Towards automated collection of application-level data provenance

TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance

Cross-platform provenance

Proceedings of the Joint EDBT/ICDT 2013 Workshops
Android provenance: diagnosing device disorders

TaPP'13 Proceedings of the 5th USENIX conference on Theory and Practice of Provenance
Declaratively processing provenance metadata

TaPP'13 Proceedings of the 5th USENIX conference on Theory and Practice of Provenance
Android provenance: diagnosing device disorders

Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance
Declaratively processing provenance metadata

Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance

Quantified Score

Hi-index	0.00

Visualization

Abstract

SPADE is an open source software infrastructure for data provenance collection and management. The underlying data model used throughout the system is graph-based, consisting of vertices and directed edges that are modeled after the node and relationship types described in the Open Provenance Model. The system has been designed to decouple the collection, storage, and querying of provenance metadata. At its core is a novel provenance kernel that mediates between the producers and consumers of provenance information, and handles the persistent storage of records. It operates as a service, peering with remote instances to enable distributed provenance queries. The provenance kernel on each host handles the buffering, filtering, and multiplexing of incoming metadata from multiple sources, including the operating system, applications, and manual curation. Provenance elements can be located locally with queries that use wildcard, fuzzy, proximity, range, and Boolean operators. Ancestor and descendant queries are transparently propagated across hosts until a terminating expression is satisfied, while distributed path queries are accelerated with provenance sketches.