A hybrid approach for efficient provenance storage

  • Authors:
  • Yulai Xie;Dan Feng;Zhipeng Tan;Lei Chen;Kiran-Kumar Muniswamy-Reddy;Yan Li;Darrell D.E. Long

  • Affiliations:
  • Huazhong University of Science and Technology, Wuhan, China;Huazhong University of Science and Technology, Wuhan, China;Huazhong University of Science and Technology, Wuhan, China;Huazhong University of Science and Technology, Wuhan, China;Harvard University, Boston, MA, USA;University of California, Santa Cruz, Santa Cruz, CA, USA;University of California, Santa Cruz, Santa Cruz, CA, USA

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Efficient provenance storage is an essential step towards the adoption of provenance. In this paper, we analyze the provenance collected from multiple workloads with a view towards efficient storage. Based on our analysis, we characterize the properties of provenance with respect to long term storage. We then propose a hybrid scheme that takes advantage of the graph structure of provenance data and the inherent duplication in provenance data. Our evaluation indicates that our hybrid scheme, a combination of web graph compression (adapted for provenance) and dictionary encoding, provides the best tradeoff in terms of compression ratio, compression time and query performance when compared to other compression schemes.