Managing semistructured data with florid: a deductive object-oriented perspective
Information Systems - Special issue on semistructured data
Dual Labeling: Answering Graph Reachability Queries in Constant Time
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
VisTrails: visualization meets data management
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Scientific workflow management and the Kepler system: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Storing and Querying Scientific Workflow Provenance Metadata Using an RDBMS
E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
Special Issue: The First Provenance Challenge
Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Automatic capture and efficient storage of e-Science experiment provenance
Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Tackling the Provenance Challenge one layer at a time
Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Data Management Challenges of Data-Intensive Scientific Workflows
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Graphs-at-a-time: query language and access methods for graph databases
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficiently answering reachability queries on very large directed graphs
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient lineage tracking for scientific workflows
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Provenance and scientific workflows: challenges and opportunities
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient provenance storage over nested data collections
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Querying and Managing Provenance through User Views in Scientific Workflows
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Differencing Provenance in Scientific Workflows
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A model for user-oriented data provenance in pipelined scientific workflows
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Performance evaluation of the karma provenance framework for scientific workflows
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
PROPUB: towards a declarative approach for publishing customized, policy-aware provenance
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Search, adapt, and reuse: the future of scientific workflows
ACM SIGMOD Record
Query language constructs for provenance
Proceedings of the 15th Symposium on International Database Engineering & Applications
Reconciling provenance policy conflicts by inventing anonymous nodes
ESWC'11 Proceedings of the 8th international conference on The Semantic Web
Query languages for graph databases
ACM SIGMOD Record
Database support for exploring scientific workflow provenance graphs
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
WebLab PROV: computing fine-grained provenance links for XML artifacts
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Proceedings of the 32nd symposium on Principles of database systems
Editorial: OPQL: Querying scientific workflow provenance at the graph level
Data & Knowledge Engineering
Hi-index | 0.00 |
A key advantage of scientific workflow systems over traditional scripting approaches is their ability to automatically record data and process dependencies introduced during workflow runs. This information is often represented through provenance graphs, which can be used by scientists to better understand, reproduce, and verify scientific results. However, while most systems record and store data and process dependencies, few provide easy-to-use and efficient approaches for accessing and querying provenance information. Instead, users formulate provenance graph queries directly against physical data representations (e.g., relational, XML, or RDF), leading to queries that are difficult to express and expensive to evaluate. We address these problems through a high-level query language tailored for expressing provenance graph queries. The language is based on a general model of provenance supporting scientific workflows that process XML data and employ update semantics. Query constructs are provided for querying both structure and lineage information. Unlike other languages that return sets of nodes as answers, our query language is closed, i.e., answers to lineage queries are sets of lineage dependencies (edges) allowing answers to be further queried. We provide a formal semantics for the language and present novel techniques for efficiently evaluating lineage queries. Experimental results on real and synthetic provenance traces demonstrate that our lineage based optimizations outperform an in-memory and standard database implementation by orders of magnitude. We also show that our strategies are feasible and can significantly reduce both provenance storage size and query execution time when compared with standard approaches.