Tracing the lineage of view data in a warehousing environment
ACM Transactions on Database Systems (TODS)
Condor-G: A Computation Management Agent for Multi-Institutional Grids
Cluster Computing
Supporting Fine-grained Data Lineage in a Database Visualization Environment
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Why and Where: A Characterization of Data Provenance
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Re-Integrating the Research Record
Computing in Science and Engineering
Practical Lineage Tracing in Data Warehouses
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Managing the Evolution of Dataflows with VisTrails
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Virtual data Grid middleware services for data-intensive science: Research Articles
Concurrency and Computation: Practice & Experience - Middleware for Grid Computing
The QuarkNet/grid collaborative learning e-Lab
CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid - Volume 01
Pegasus: A framework for mapping complex scientific workflows onto distributed systems
Scientific Programming
Provenance-aware storage systems
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Globus toolkit version 4: software for service-oriented systems
NPC'05 Proceedings of the 2005 IFIP international conference on Network and Parallel Computing
Automatic generation of workflow provenance
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Managing rapidly-evolving scientific workflows
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Virtual logbooks and collaboration in science and software development
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Applying provenance in distributed organ transplant management
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Provenance implementation in a scientific simulation environment
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Enabling provenance on large scale e-science applications
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Issues in automatic provenance collection
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
AstroDAS: sharing assertions across astronomy catalogues through distributed annotation
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
An identity crisis in the life sciences
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Provenance Querying for End-Users: A Drug Resistance Case Study
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
A Logic Programming Approach to Scientific Workflow Provenance Querying
Provenance and Annotation of Data and Processes
A Provenance-Based Fault Tolerance Mechanism for Scientific Workflows
Provenance and Annotation of Data and Processes
Neuroimaging Data Provenance Using the LONI Pipeline Workflow Environment
Provenance and Annotation of Data and Processes
Scientific workflow design for mere mortals
Future Generation Computer Systems
Efficient provenance storage over nested data collections
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Data genome: an abstract model for data evolution
ISICA'07 Proceedings of the 2nd international conference on Advances in computation and intelligence
Provenance tracking in the virolab virtual laboratory
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
The Foundations for Provenance on the Web
Foundations and Trends in Web Science
Provenance management in Swift
Future Generation Computer Systems
Exploring provenance in high performance scientific computing
Proceedings of the first annual workshop on High performance computing meets databases
Performance evaluation of the karma provenance framework for scientific workflows
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
MTCProv: a practical provenance query framework for many-task scientific computing
Distributed and Parallel Databases
Hi-index | 0.00 |
In many domains of science, engineering, and commerce, data analysis systems are employed to derive new data (and ultimately, one hopes, knowledge) from datasets describing experimental results or simulated phenomena. To support such analyses, we have developed a “virtual data system” that allows users first to define, then to invoke, and finally explore the provenance of procedures (and workflows comprising multiple procedure calls) that perform such data derivations. The underlying execution model is “functional” in the sense that procedures read (but do not modify) their input and produce output via deterministic computations. This property makes it straightforward for the virtual data system to record not only the recipe for producing any given data object but also sufficient information about the environment in which the recipe has been executed, all with sufficient fidelity that the steps used to create a data object can be re-executed to reproduce the data object at a later time or a different location. The virtual data system maintains this information in an integrated schema alongside semantic annotations, and thus enables a powerful query capability in which the rich semantic information implied by knowledge of the structure of data derivation procedures can be exploited to provide an information environment that fuses recipe, history, and application-specific semantics. We provide here an overview of this integration, the queries and transformations that it enables, and examples of how these capabilities can serve scientific processes.