ISOBAR Preconditioner for Effective and High-throughput Lossless Data Compression

Authors:
Eric R. Schendel;Ye Jin;Neil Shah;Jackie Chen;C. S. Chang;Seung-Hoe Ku;Stephane Ethier;Scott Klasky;Robert Latham;Robert Ross;Nagiza F. Samatova
Affiliations:
-;-;-;-;-;-;-;-;-;-;-
Venue:
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Year:
2012

Citing 0
Cited 5

ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Byte-precision level of detail processing for variable precision analytics

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
When is multi-version checkpointing needed?

Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale
Assessing the effects of data compression in simulations using physically motivated metrics

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A generic high-performance method for deinterleaving scientific data

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient handling of large volumes of data is a necessity for exascale scientific applications and database systems. To address the growing imbalance between the amount of available storage and the amount of data being produced by high speed (FLOPS) processors on the system, data must be compressed to reduce the total amount of data placed on the file systems. General-purpose loss less compression frameworks, such as zlib and bzlib2, are commonly used on datasets requiring loss less compression. Quite often, however, many scientific data sets compress poorly, referred to as hard-to-compress datasets, due to the negative impact of highly entropic content represented within the data. An important problem in better loss less data compression is to identify the hard-to-compress information and subsequently optimize the compression techniques at the byte-level. To address this challenge, we introduce the In-Situ Orthogonal Byte Aggregate Reduction Compression (ISOBAR-compress) methodology as a preconditioner of loss less compression to identify and optimize the compression efficiency and throughput of hard-to-compress datasets.