Containment graphs, posets, and related classes of graphs
Proceedings of the third international conference on Combinatorial mathematics
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Modular decomposition and transitive orientation
Discrete Mathematics - Special issue on partial ordered sets
Permutation Graphs and Transitive Graphs
Journal of the ACM (JACM)
An Algorithm for Subgraph Isomorphism
Journal of the ACM (JACM)
Tracing the lineage of view data in a warehousing environment
ACM Transactions on Database Systems (TODS)
Distributed and Parallel Databases
Dependable Computing in Virtual Laboratories
Proceedings of the 17th International Conference on Data Engineering
Practical Lineage Tracing in Data Warehouses
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Scientific data repositories: designing for a moving target
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A comprehensive XQuery to SQL translation using dynamic interval encoding
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Kepler: An Extensible System for Design and Execution of Scientific Workflows
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
GridSAT: A Chaff-based Distributed SAT Solver for the Grid
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Lineage retrieval for scientific data processing: a survey
ACM Computing Surveys (CSUR)
Nested intervals tree encoding in SQL
ACM SIGMOD Record
A survey of data provenance in e-science
ACM SIGMOD Record
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Online Availability of fMRI Results Images
Journal of Cognitive Neuroscience
Trio: a system for data, uncertainty, and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient exploration of large scientific databases
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
GridDB: a data-centric overlay for scientific grids
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
An annotation management system for relational databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
SWAMI: integrating biological databases and analysis tools within user friendly environment
DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
A relational nested interval encoding scheme for XML storage and retrieval
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Efficient provenance storage over nested data collections
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Detecting and resolving unsound workflow views for correct provenance analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Tracking Files in the Kepler Provenance Framework
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
An Access Control Language for a General Provenance Model
SDM '09 Proceedings of the 6th VLDB Workshop on Secure Data Management
A navigation model for exploring scientific workflow provenance graphs
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Pipeline-centric provenance model
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Techniques for efficiently querying scientific workflow provenance graphs
Proceedings of the 13th International Conference on Extending Database Technology
Fine-grained and efficient lineage querying of collection-based workflow provenance
Proceedings of the 13th International Conference on Extending Database Technology
An optimal labeling scheme for workflow provenance using skeleton labels
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data-centric workflows in government: a new avenue of research?
Proceedings of the 11th Annual International Digital Government Research Conference on Public Administration Online: Challenges and Opportunities
Efficient querying of distributed provenance stores
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Towards a secure and efficient system for end-to-end provenance
TAPP'10 Proceedings of the 2nd conference on Theory and practice of provenance
Searching workflows with hierarchical views
Proceedings of the VLDB Endowment
The Foundations for Provenance on the Web
Foundations and Trends in Web Science
Generating sound workflow views for correct provenance analysis
ACM Transactions on Database Systems (TODS)
Middleware for managing provenance metadata
Middleware '10 Posters and Demos Track
Human-assisted graph search: it's okay to ask questions
Proceedings of the VLDB Endowment
Representing distributed systems using the Open Provenance Model
Future Generation Computer Systems
Labeling recursive workflow executions on-the-fly
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
PROPUB: towards a declarative approach for publishing customized, policy-aware provenance
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Achieving reproducibility by combining provenance with service and workflow versioning
Proceedings of the 6th workshop on Workflows in support of large-scale science
Reconciling provenance policy conflicts by inventing anonymous nodes
ESWC'11 Proceedings of the 8th international conference on The Semantic Web
Dependency path patterns as the foundation of access control in provenance-aware systems
TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
Labeling workflow views with fine-grained dependencies
Proceedings of the VLDB Endowment
Context provenance to enhance the dependability of ambient intelligence systems
Personal and Ubiquitous Computing
Towards provenance and risk-awareness in social computing
Proceedings of the First International Workshop on Secure and Resilient Architectures and Systems
ourSpaces: design and deployment of a semantic virtual research environment
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
SPADE: support for provenance auditing in distributed environments
Proceedings of the 13th International Middleware Conference
Efficient recovery of missing events
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Data lineage and data provenance are key to the management of scientific data. Not knowing the exact provenance and processing pipeline used to produce a derived data set often renders the data set useless from a scientific point of view. On the positive side, capturing provenance information is facilitated by the widespread use of workflow tools for processing scientific data. The workflow process describes all the steps involved in producing a given data set and, hence, captures its lineage. On the negative side, efficiently storing and querying workflow based data lineage is not trivial. All existing solutions use recursive queries and even recursive tables to represent the workflows. Such solutions do not scale and are rather inefficient. In this paper we propose an alternative approach to storing lineage information captured as a workflow process. We use a space and query efficient interval representation for dependency graphs and show how to transform arbitrary workflow processes into graphs that can be stored using such representation. We also characterize the problem in terms of its overall complexity and provide a comprehensive performance evaluation of the approach.