Evaluation of a Hybrid Approach for Efficient Provenance Storage

  • Authors:
  • Yulai Xie;Kiran-Kumar Muniswamy-Reddy;Dan Feng;Yan Li;Darrell D. E. Long

  • Affiliations:
  • Huazhong University of Science and Technology;Harvard University;Huazhong University of Science and Technology;University of California, Santa Cruz;University of California, Santa Cruz

  • Venue:
  • ACM Transactions on Storage (TOS)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Provenance is the metadata that describes the history of objects. Provenance provides new functionality in a variety of areas, including experimental documentation, debugging, search, and security. As a result, a number of groups have built systems to capture provenance. Most of these systems focus on provenance collection, a few systems focus on building applications that use the provenance, but all of these systems ignore an important aspect: efficient long-term storage of provenance. In this article, we first analyze the provenance collected from multiple workloads and characterize the properties of provenance with respect to long-term storage. We then propose a hybrid scheme that takes advantage of the graph structure of provenance data and the inherent duplication in provenance data. Our evaluation indicates that our hybrid scheme, a combination of Web graph compression (adapted for provenance) and dictionary encoding, provides the best trade-off in terms of compression ratio, compression time, and query performance when compared to other compression schemes.