Program partitioning and synchronization on multiprocessor systems
Program partitioning and synchronization on multiprocessor systems
Loop skewing: the wavefront method revisited
International Journal of Parallel Programming
Compressions and isoperimetric inequalities
Journal of Combinatorial Theory Series A
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Programmer's guide to Fortran 90
Programmer's guide to Fortran 90
Optimizing Fortran90D/HPF for distributed-memory computers
Optimizing Fortran90D/HPF for distributed-memory computers
The implementation and evaluation of fusion and contraction in array languages
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The parallel execution of DO loops
Communications of the ACM
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Fortran for the Texas Instruments ASC system
Proceedings of the conference on Programming languages and compilers for parallel and vector machines
Dependence analysis for subscripted variables and its application to program transformations
Dependence analysis for subscripted variables and its application to program transformations
Optimizing supercompilers for supercomputers
Optimizing supercompilers for supercomputers
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
Parallel loop generation and scheduling
The Journal of Supercomputing
Hi-index | 0.00 |
Array syntax, which is supported in many technical programming languages, adds expressive power by allowing operations on and assignments to whole arrays and array sections. To compile an array assignment statement to a uniprocessor, the language processor must convert the statement into a loop that has the same meaning. This process is called scalarization.Scalarization presents a significant technical problem because an array assignment needs to be implemented as if all inputs are fetched before any outputs are stored. Since a loop intermixes loads and stores, the compiler typically allocates a temporary array to hold the intermediate result. Because these extra temporary arrays can cause performance problems in cache, many techniques have been developed to avoid their use or minimize their size.In this paper, we present a novel application of two compiler strategies--loop alignment and loop skewing--to address this problem. We show that these strategies can achieve the asymptotically minimal memory allocation for stencil computations. Our experiments with loop alignment and loop skewing demonstrate that it is extremely effective in improving memory hierarchy performance of Fortran 90 array code on standard uniprocessors. The result should be applicable to other array languages, such as MATLAB.