A generic high-performance method for deinterleaving scientific data

Authors:
Eric R. Schendel;Steve Harenberg;Houjun Tang;Venkatram Vishwanath;Michael E. Papka;Nagiza F. Samatova
Affiliations:
North Carolina State University, Raleigh, NC and Argonne National Laboratory, Argonne, IL and Oak Ridge National Laboratory, Oak Ridge, TN;North Carolina State University, Raleigh, NC and Oak Ridge National Laboratory, Oak Ridge, TN;North Carolina State University, Raleigh, NC and Oak Ridge National Laboratory, Oak Ridge, TN;Argonne National Laboratory, Argonne, IL;Northern Illinois University, DeKalb, IL and Argonne National Laboratory, Argonne, IL;North Carolina State University, Raleigh, NC and Oak Ridge National Laboratory, Oak Ridge, TN
Venue:
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Year:
2013

Citing 15
Cited 0

Efficient transposition algorithms for large matrices

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Transporting a matrix on a vector computer

Parallel Computing
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Memory Hierarchy Considerations for Fast Transpose and Bit-Reversals

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
PRIM: A Fast Matrix Transpose Method

IEEE Transactions on Software Engineering
In Situ Visualization at Extreme Scale: Challenges and Opportunities

IEEE Computer Graphics and Applications
Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor

IEEE Micro
Reducing energy usage with memory and computation-aware dynamic frequency scaling

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Using runtime activity to dynamically filter out inefficient data prefetches

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Performance Analysis and Benchmarking of the Intel SCC

CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Why Modern CPUs Are Starving and What Can Be Done about It

Computing in Science and Engineering
ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
SERA-IO: Integrating Energy Consciousness into Parallel I/O Middleware

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
ISOBAR Preconditioner for Effective and High-throughput Lossless Data Compression

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Byte-precision level of detail processing for variable precision analytics

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-performance and energy-efficient data management applications are a necessity for HPC systems due to the extreme scale of data produced by high fidelity scientific simulations that these systems support. Data layout in memory hugely impacts the performance. For better performance, most simulations interleave variables in memory during their calculation phase, but deinterleave the data for subsequent storage and analysis. As a result, efficient data deinterleaving is critical; yet, common deinterleaving methods provide inefficient throughput and energy performance. To address this problem, we propose a deinterleaving method that is high performance, energy efficient, and generic to any data type. To the best of our knowledge, this is the first deinterleaving method that 1) exploits data cache prefetching, 2) reduces memory accesses, and 3) optimizes the use of complete cache line writes. When evaluated against conventional deinterleaving methods on 105 STREAM standard micro-benchmarks, our method always improved throughput and throughput/watt on multi-core systems. In the best case, our deinterleaving method improved throughput up to 26.2x and throughput/watt up to 7.8x.