Scalarization using loop alignment and loop skewing

Authors:
Yuan Zhao;Ken Kennedy
Affiliations:
Computer Science Department, Rice University, 6100 Main St, Houston, Texas;Computer Science Department, Rice University, 6100 Main St, Houston, Texas
Venue:
The Journal of Supercomputing
Year:
2005

Citing 14
Cited 1

Program partitioning and synchronization on multiprocessor systems

Program partitioning and synchronization on multiprocessor systems
Loop skewing: the wavefront method revisited

International Journal of Parallel Programming
Compressions and isoperimetric inequalities

Journal of Combinatorial Theory Series A
Practical dependence testing

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
Programmer's guide to Fortran 90

Programmer's guide to Fortran 90
Optimizing Fortran90D/HPF for distributed-memory computers

Optimizing Fortran90D/HPF for distributed-memory computers
The implementation and evaluation of fusion and contraction in array languages

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The parallel execution of DO loops

Communications of the ACM
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Fortran for the Texas Instruments ASC system

Proceedings of the conference on Programming languages and compilers for parallel and vector machines
Dependence analysis for subscripted variables and its application to program transformations

Dependence analysis for subscripted variables and its application to program transformations
Optimizing supercompilers for supercomputers

Optimizing supercompilers for supercomputers
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse

Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse

Parallel loop generation and scheduling

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Array syntax, which is supported in many technical programming languages, adds expressive power by allowing operations on and assignments to whole arrays and array sections. To compile an array assignment statement to a uniprocessor, the language processor must convert the statement into a loop that has the same meaning. This process is called scalarization.Scalarization presents a significant technical problem because an array assignment needs to be implemented as if all inputs are fetched before any outputs are stored. Since a loop intermixes loads and stores, the compiler typically allocates a temporary array to hold the intermediate result. Because these extra temporary arrays can cause performance problems in cache, many techniques have been developed to avoid their use or minimize their size.In this paper, we present a novel application of two compiler strategies--loop alignment and loop skewing--to address this problem. We show that these strategies can achieve the asymptotically minimal memory allocation for stencil computations. Our experiments with loop alignment and loop skewing demonstrate that it is extremely effective in improving memory hierarchy performance of Fortran 90 array code on standard uniprocessors. The result should be applicable to other array languages, such as MATLAB.