Compiling stencils in high performance Fortran

Authors:
Gerald Roth;John Mellor-Crummey;Ken Kennedy;R. Gregg Brickner
Affiliations:
Rice University, Houston, TX;Rice University, Houston, TX;Rice University, Houston, TX;Los Alamos National Laboratory, Los Alamos, NM
Venue:
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Year:
1997

Citing 11
Cited 16

Updating distributed variables in local computations

Concurrency: Practice and Experience
Fortran at ten gigaflops: the connection machine convolution compiler

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
POLYSHIFT communications software for the connection machine system CM-200

Scientific Programming
An HPF compiler for the IBM SP2

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
PGHPF—an optimizing High Performance Fortran compiler for distributed memory machines

Scientific Programming - Special issue: High Performance Fortran comes of age
Optimizing Fortran90D/HPF for distributed-memory computers

Optimizing Fortran90D/HPF for distributed-memory computers
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Optimizing Fortran 90 Shift Operations on Distributed-Memory Multicomputers

LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing

Loop fusion in high performance Fortran

ICS '98 Proceedings of the 12th international conference on Supercomputing
Eliminating redundancies in sum-of-product array computations

ICS '01 Proceedings of the 15th international conference on Supercomputing
Compilation of Vector Statements of C[] Language for Architectures with Multilevel Memory Hierarchy

Programming and Computing Software
Achieving Scalable Locality with Time Skewing

International Journal of Parallel Programming
On Materializations of Array-Valued Temporaries

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
14.9 TFLOPS three-dimensional fluid simulation for fusion science with HPF on the Earth Simulator

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Optimizing inter-processor data locality on embedded chip multiprocessors

Proceedings of the 5th ACM international conference on Embedded software
Compiler-directed proactive power management for networks

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
2D data locality: definition, abstraction, and application

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Auto-CFD-NOW: A pre-compiler for effectively parallelizing CFD applications on networks of workstations

The Journal of Supercomputing
On minimizing materializations of array-valued temporaries

ACM Transactions on Programming Languages and Systems (TOPLAS)
Sketching stencils

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Generation and optimisation of code using Coxeter lattice paths

Proceedings of the 2007 international workshop on Parallel symbolic computation
An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors

Transactions on High-Performance Embedded Architectures and Compilers I
The pochoir stencil compiler

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

For many Fortran90 and HPF programs performing dense matrix computations, the main computational portion of the program belongs to a class of kernels known as stencils. Stencil computations are commonly used in solving partial differential equations, image processing, and geometric modeling. The efficient handling of such stencils is critical for achieving high performance on distributed-memory machines. Compiling stencils into efficient code is viewed as so important that some companies have built special-purpose compilers for handling them and others have added stencil-recognizers to existing compilers.In this paper we present a general compilation strategy for stencils written using Fortran90 array constructs. Our strategy is capable of optimizing single or multi-statement stencils and is applicable to stencils specified with shift intrinsics or with array-syntax all equally well. The strategy eliminates the need for pattern-recognition algorithms by orchestrating a set of optimizations that address the overhead of both intraprocessor and interprocessor data movement that results from the translation of Fortran90 array constructs. Our experimental results show that code produced by this strategy beats or matches the best code produced by the special-purpose compilers or pattern-recognition schemes that are known to us. In addition, our strategy produces highly optimized code in situations where the others fail, producing several orders of magnitude performance improvement, and thus provides a stencil compilation strategy that is more robust than its predecessors.