Why and Where: A Characterization of Data Provenance
ICDT '01 Proceedings of the 8th International Conference on Database Theory
PlanetLab: an overlay testbed for broad-coverage services
ACM SIGCOMM Computer Communication Review
Lineage retrieval for scientific data processing: a survey
ACM Computing Surveys (CSUR)
A Provenance-Aware Weighted Fault Tolerance Scheme for Service-Based Applications
ISORC '05 Proceedings of the Eighth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing
Named graphs, provenance and trust
WWW '05 Proceedings of the 14th international conference on World Wide Web
A survey of data provenance in e-science
ACM SIGMOD Record
The Globus Striped GridFTP Framework and Server
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
A Framework for Collecting Provenance in Data-Centric Scientific Workflows
ICWS '06 Proceedings of the IEEE International Conference on Web Services
What Supercomputers Say: A Study of Five System Logs
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Provenance-aware storage systems
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Special Issue: The First Provenance Challenge
Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Provenance and scientific workflows: challenges and opportunities
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Understanding customer problem troubleshooting from storage system logs
FAST '09 Proccedings of the 7th conference on File and storage technologies
Provenance in Databases: Why, How, and Where
Foundations and Trends in Databases
Provenance Information Model of Karma Version 3
SERVICES '09 Proceedings of the 2009 Congress on Services - I
Detecting large-scale system problems by mining console logs
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Provenance as first class cloud data
ACM SIGOPS Operating Systems Review
SherLog: error diagnosis by connecting clues from run-time logs
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Provenance as dependency analysis
DBPL'07 Proceedings of the 11th international conference on Database programming languages
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Using provenance to extract semantic file attributes
TAPP'10 Proceedings of the 2nd conference on Theory and practice of provenance
Provenance-based validation of e-science experiments
ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Static compiler analysis for workflow provenance
WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
Hi-index | 0.00 |
As new data products of research increasingly become the product or output of complex processes, the lineage of the resulting products takes on greater importance as a description of the processes that contributed to the result. Without adequate description of data products, their reuse is lessened. The act of instrumenting an application for provenance capture is burdensome, however. This paper explores the option of deriving provenance from existing log files, an approach that reduces the instrumentation task substantially but raises questions about sifting through huge amounts of information for what may or may not be complete provenance. In this paper we study the tradeoff of ease of capture and provenance completeness, and show that under some circumstances capture through logs can result in high quality provenance.