Recording and using provenance in a protein compressibility experiment

Authors:
P. Groth;S. Miles;Weijian Fang;S. C. Wong;K.-P. Zauner;L. Moreau
Affiliations:
Sch. of Electron. & Comput. Sci., Southampton Univ., UK;Sch. of Electron. & Comput. Sci., Southampton Univ., UK;Sch. of Electron. & Comput. Sci., Southampton Univ., UK;Sch. of Electron. & Comput. Sci., Southampton Univ., UK;Sch. of Electron. & Comput. Sci., Southampton Univ., UK;Sch. of Electron. & Comput. Sci., Southampton Univ., UK
Venue:
HPDC '05 Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium
Year:
2005

Citing 0
Cited 25

Provenance management in curated databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
PrIMe: a software engineering methodology for developing provenance-aware applications

Proceedings of the 6th international workshop on Software engineering and middleware
Provenance-based validation of e-science experiments

Web Semantics: Science, Services and Agents on the World Wide Web
Tracing lineage beyond relational operators

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficient provenance storage

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A Dataflow-Oriented Atomicity and Provenance System for Pipelined Scientific Workflows

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
A model of process documentation to determine provenance in mash-ups

ACM Transactions on Internet Technology (TOIT)
Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures

Provenance and Annotation of Data and Processes
Atomicity and provenance support for pipelined scientific workflows

Future Generation Computer Systems
Recording Process Documentation in the Presence of Failures

Methods, Models and Tools for Fault Tolerance
Capturing custom link semantics among heterogeneous artifacts and tools

TEFSE '09 Proceedings of the 2009 ICSE Workshop on Traceability in Emerging Forms of Software Engineering
Why not?

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Do You Know Where Your Data's Been? --- Tamper-Evident Database Provenance

SDM '09 Proceedings of the 6th VLDB Workshop on Secure Data Management
RDFProv: A relational RDF store for querying and managing scientific workflow provenance

Data & Knowledge Engineering
The Foundations for Provenance on the Web

Foundations and Trends in Web Science
PrIMe: A methodology for developing provenance-aware applications

ACM Transactions on Software Engineering and Methodology (TOSEM)
Efficient storage and temporal query evaluation in hierarchical data archiving systems

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Provenance-based validation of e-science experiments

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Multi-unit combinatorial reverse auctions with transformability relationships among goods

WINE'05 Proceedings of the First international conference on Internet and Network Economics
Managing rapidly-evolving scientific workflows

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
A provenance model for manually curated data

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Performance evaluation of the karma provenance framework for scientific workflows

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
A hybrid approach for efficient provenance storage

Proceedings of the 21st ACM international conference on Information and knowledge management
Evaluation of a Hybrid Approach for Efficient Provenance Storage

ACM Transactions on Storage (TOS)
Automated data provenance capture in spreadsheets, with case studies

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Very large scale computations are now becoming routinely used as a methodology to undertake scientific research. In this context, 'provenance systems' are regarded as the equivalent of the scientist's logbook for in silico experimentation: provenance captures the documentation of the process that led to some result. Using a protein compressibility analysis application, we derive a set of generic use cases for a provenance system. In order to support these, we address the following fundamental questions: what is provenance? How to record it? What is the performance impact for grid execution? What is the performance of reasoning? In doing so, we define a technology-independent notion of provenance that captures interactions between components, internal component information and grouping of interactions, so as to allow us to analyze and reason about the execution of scientific processes. In order to support persistent provenance in heterogeneous applications, we introduce a separate provenance store, in which provenance documentation can be stored, archived and queried independently of the technology used to run the application. Through a series of practical tests, we evaluate the performance impact of such a provenance system. In summary, we demonstrate that provenance recording overhead of our prototype system remains under 10% of execution time, and we show that the recorded information successfully supports our use cases in a performant manner.