Inter-array Data Regrouping

Authors:
Chen Ding;Ken Kennedy
Affiliations:
-;-
Venue:
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Year:
1999

Citing 17
Cited 6

Memory storage patterns in parallel processing

Memory storage patterns in parallel processing
Strategies for cache and local memory management by global program transformation

Proceedings of the 1st International Conference on Supercomputing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Reducing false sharing on shared memory multiprocessors through compile time data transformations

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic data layout for distributed memory machines

Automatic data layout for distributed memory machines
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure definition

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
On Estimating and Enhancing Cache Effectiveness

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Efficient Interprocedural Data Placement Optimisation in a Parallel Library

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
On the completeness of a generalized matching problem

STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
A Matrix-Based Approach to the Global Locality Optimization Problem

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Cache management by the compiler

Cache management by the compiler
Array restructuring for cache locality

Array restructuring for cache locality

Influence of Array Allocation Mechanisms on Memory System Energy

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Improving Locality for Adaptive Irregular Scientific Codes

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
A Comparison of Locality Transformations for Irregular Codes

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Array Unification: A Locality Optimization Technique

CC '01 Proceedings of the 10th International Conference on Compiler Construction
MiniTasking: improving cache performance for multiple query workloads

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the speed gap between CPU and memory widens, memory hierarchy has become the performance bottleneck for most applications because of both the high latency and low bandwidth of direct memory access. With the recent introduction of latency hiding strategies on modern machines, limited memory bandwidth has become the primary performance constraint and, consequently, the effective use of available memory bandwidth has become critical. Since memory data is transferred one cache block at a time, improving the utilization of cache blocks can directly improve memory bandwidth utilization and program performance. However, existing optimizations do not maximize cache-block utilization because they are intra-array; that is, they improve only data reuse within single arrays, and they do not group useful data of multiple arrays into the same cache block. In this paper, we present inter-array data regrouping, a global data transformation that first splits and then selectively regroups all data arrays in a program. The new transformation is optimal in the sense that it exploits inter-array cache-block reuse when and only when it is always profitable. When evaluated on real-world programs with both regular contiguous data access, and irregular and dynamic data access, inter-array data regrouping transforms as many as 26 arrays in a program and improves the overall performance by as much as 32%.