Understanding provenance black boxes

Authors:
Adriane Chapman;H. V. Jagadish
Affiliations:
The MITRE Corporation, McLean, USA;University of Michigan, Ann Arbor, USA
Venue:
Distributed and Parallel Databases
Year:
2010

Citing 23
Cited 1

On propagation of deletions and annotations through views

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Supporting Fine-grained Data Lineage in a Database Visualization Environment

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Lineage Tracing for General Data Warehouse Transformations

Proceedings of the 27th International Conference on Very Large Data Bases
Summarizability in OLAP and Statistical Data Bases

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Program slicing

ICSE '81 Proceedings of the 5th international conference on Software engineering
Provenance management in curated databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
VisTrails: visualization meets data management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Quality views: capturing and exploiting the user perspective on data quality

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
ORCHESTRA: facilitating collaborative data sharing

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Managing information quality in e-science: the qurator workbench

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Provenance-aware storage systems

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Querying and Creating Visualizations by Analogy

IEEE Transactions on Visualization and Computer Graphics
Tracing lineage beyond relational operators

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Addressing the provenance challenge using ZOOM

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Special Issue: The First Provenance Challenge

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Automatic capture and reconstruction of computational provenance

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Provenance Services for Distributed Workflows

CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Efficient provenance storage over nested data collections

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Project histories: managing data provenance across collection-oriented scientific workflow runs

DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Provenance explorer – customized provenance views using semantic inferencing

ISWC'06 Proceedings of the 5th international conference on The Semantic Web

Building a generic debugger for information extraction pipelines

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current provenance stores associated with workflow management systems (WfMSs) capture enough coarse-grained information to describe which datasets were used and which processes were run. While this information is enough to rebuild a workflow run, it is not enough to facilitate user understanding. Because the data is manipulated via a series of black boxes, it is often impossible for a human to understand what happened to the data. In this work, we highlight the missing information that can assist user understanding. Unfortunately, provenance information is already very complex and difficult for a user to comprehend, which can be exacerbated by adding the extra information needed for deeper blackbox understanding. In order to alleviate this, we develop a model of provenance answers that follow a "roll up", "drill down" strategy. We evaluate these techniques to determine if users have better understanding of provenance information. We show how this information can be captured by workflow management systems, and that the structures and information needed for this model are a negligible addition to standard provenance stores. Finally, we implement these techniques in a real provenance system, and evaluate implementation feasibility.