BioBrowsing: Making the Most of the Data Available in Entrez
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
PDiffView: viewing the difference in provenance of workflow results
Proceedings of the VLDB Endowment
Techniques for efficiently querying scientific workflow provenance graphs
Proceedings of the 13th International Conference on Extending Database Technology
Fine-grained and efficient lineage querying of collection-based workflow provenance
Proceedings of the 13th International Conference on Extending Database Technology
Efficiently supporting secure and reliable collaboration in scientific workflows
Journal of Computer and System Sciences
Journal of Parallel and Distributed Computing
Searching workflows with hierarchical views
Proceedings of the VLDB Endowment
The Foundations for Provenance on the Web
Foundations and Trends in Web Science
Search, adapt, and reuse: the future of scientific workflows
ACM SIGMOD Record
A data dependency based strategy for intermediate data storage in scientific cloud workflow systems
Concurrency and Computation: Practice & Experience
Towards semantic comparison of multi-granularity process traces
Knowledge-Based Systems
Efficient recovery of missing events
Proceedings of the VLDB Endowment
Editorial: OPQL: Querying scientific workflow provenance at the graph level
Data & Knowledge Engineering
Hi-index | 0.00 |
Scientific workflow management systems are increasingly providing the ability to manage and query the provenance of data products. However, the problem of differencing the provenance of two data products produced by executions of the same specification has not been adequately addressed. Although this problem is NP-hard for general workflow specifications, an analysis of real scientific (and business) workflows shows that their specifications can be captured as series-parallel graphs overlaid with well-nested forking and looping. For this natural restriction, we present efficient, polynomial-time algorithms for differencing executions of the same specification and thereby understanding the difference in the provenance of their data products. We then describe a prototype called PDiffView built around our differencing algorithm. Experimental results demonstrate the scalability of our approach using collected, real workflows and increasingly complex runs.