Zoom*UserViews: querying relevant provenance in workflow systems
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Advances and Challenges for Scalable Provenance in Stream Processing Systems
Provenance and Annotation of Data and Processes
Fine-grained and efficient lineage querying of collection-based workflow provenance
Proceedings of the 13th International Conference on Extending Database Technology
A general-purpose provenance library
TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
Hierarchical models of provenance
TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
Data-intensive architecture for scientific knowledge discovery
Distributed and Parallel Databases
Semantics and provenance for processing element composition in dispel workflows
WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
Hi-index | 0.00 |
Harvesting provenance for streaming workflows presents challenges related to the high rate of the updates and a large distribution of the execution, which can be spread across several institutional infrastructures. Moreover, the typically large volume of data produced by each transformation step can not be always stored and preserved efficiently. This can represent an obstacle for the evaluation of the results, for instance, in real-time, suggesting the importance of customisable metadata extraction procedures. In this paper we present our approach to the aforementioned provenance challenges within a use-case driven scenario in the field of seismology, which requires the execution of processing pipelines over a large datastream. In particular, we will discuss the current implementation and the upcoming challenges for an in-worfklow programmatic approach to provenance tracing, building on composite functions, selective recording and domain specific metadata production.