ACM SIGMOD Record
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
XMill: an efficient compressor for XML data
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Query optimization in compressed database systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Compressing Relations and Indexes
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Towards Compressing Web Graphs
DCC '01 Proceedings of the Data Compression Conference
Compressing the Graph Structure of the Web
DCC '01 Proceedings of the Data Compression Conference
XGRIND: A Query-Friendly XML Compressor
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Transparent Result Caching
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
The WebGraph Framework II: Codes For The World-Wide Web
DCC '04 Proceedings of the Conference on Data Compression
The webgraph framework I: compression techniques
Proceedings of the 13th international conference on World Wide Web
Composing Lineage Metadata with XML for Custom Satellite-Derived Data Products
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Super-Scalar RAM-CPU Cache Compression
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
A Framework for Collecting Provenance in Data-Centric Scientific Workflows
ICWS '06 Proceedings of the IEEE International Conference on Web Services
Provenance-aware storage systems
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Recording and using provenance in a protein compressibility experiment
HPDC '05 Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium
Automatic capture and efficient storage of e-Science experiment provenance
Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Using provenance to aid in personal file search
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Provenance Information Model of Karma Version 3
SERVICES '09 Proceedings of the 2009 Congress on Services - I
Semantic middleware for e-science knowledge spaces
Proceedings of the 7th International Workshop on Middleware for Grids, Clouds and e-Science
Layering in provenance systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
The Open Provenance Model core specification (v1.1)
Future Generation Computer Systems
A universal algorithm for sequential data compression
IEEE Transactions on Information Theory
A hybrid approach for efficient provenance storage
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Provenance is the metadata that describes the history of objects. Provenance provides new functionality in a variety of areas, including experimental documentation, debugging, search, and security. As a result, a number of groups have built systems to capture provenance. Most of these systems focus on provenance collection, a few systems focus on building applications that use the provenance, but all of these systems ignore an important aspect: efficient long-term storage of provenance. In this article, we first analyze the provenance collected from multiple workloads and characterize the properties of provenance with respect to long-term storage. We then propose a hybrid scheme that takes advantage of the graph structure of provenance data and the inherent duplication in provenance data. Our evaluation indicates that our hybrid scheme, a combination of Web graph compression (adapted for provenance) and dictionary encoding, provides the best trade-off in terms of compression ratio, compression time, and query performance when compared to other compression schemes.