Efficient querying of distributed provenance stores

Authors:
Ashish Gehani;Minyoung Kim;Tanu Malik
Affiliations:
SRI International, Menlo Park, CA;SRI International, Menlo Park, CA;Purdue University, West Lafayette, IN
Venue:
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Year:
2010

Citing 31
Cited 0

On-line caching as cache size varies

SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
Page replacement with multi-size pages and applications to Web caching

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Proxy caching that estimates page load delays

Selected papers from the sixth international conference on World Wide Web
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
An adaptive peer-to-peer network for distributed caching of OLAP results

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
DBCache: database caching for web application servers

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Semantic Data Caching and Replacement

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
A linear-time heuristic for improving network partitions

DAC '82 Proceedings of the 19th Design Automation Conference
In-Memory Data Management in the Application Tier

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Popularity-Aware Greedy Dual-Size Web Proxy Caching Algorithms

ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Grid Computing: Making the Global Infrastructure a Reality

Grid Computing: Making the Global Infrastructure a Reality
Performance and Scalability of a Replica Location Service

HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Optimal File-Bundle Caching Algorithms for Data-Grids

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Bypass Caching: Making Scientific Databases Good Network Citizens

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A survey of data provenance in e-science

ACM SIGMOD Record
Provenance-Aware Sensor Data Storage

ICDEW '05 Proceedings of the 21st International Conference on Data Engineering Workshops
Trio: a system for data, uncertainty, and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Cost-aware WWW proxy caching algorithms

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Provenance-aware storage systems

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Modelling the provenance of data in autonomous systems

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Connecting Scientific Data to Scientific Experiments with Provenance

E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
Special Issue: The First Provenance Challenge

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Automatic capture and reconstruction of computational provenance

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Efficient lineage tracking for scientific workflows

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Steps toward managing lineage metadata in grid clusters

TAPP'09 First workshop on on Theory and practice of provenance
On the Efficiency of Provenance Queries

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Applying provenance in distributed organ transplant management

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Issues in automatic provenance collection

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
GreedyDual* Web caching algorithm: exploiting the two sources of temporal locality in Web request streams

Computer Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current projects that automate the collection of provenance information use a centralized architecture for managing the resulting metadata - that is, provenance is gathered at remote hosts and submitted to a central provenance management service. In contrast, we are developing a completely decentralized system with each computer maintaining the authoritative repository of the provenance gathered on it. Our model has several advantages, such as scaling to large amounts of metadata generation, providing low-latency access to provenance metadata about local data, avoiding the need for synchronization with a central service after operating while disconnected from the network, and letting users retain control over their data provenance records. We describe the SPADE project's support for tracking data provenance in distributed environments, including how queries can be optimized with provenance sketches, pre-caching, and caching.