Updating distributed variables in local computations
Concurrency: Practice and Experience
Fortran at ten gigaflops: the connection machine convolution compiler
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
POLYSHIFT communications software for the connection machine system CM-200
Scientific Programming
An HPF compiler for the IBM SP2
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
PGHPF—an optimizing High Performance Fortran compiler for distributed memory machines
Scientific Programming - Special issue: High Performance Fortran comes of age
Optimizing Fortran90D/HPF for distributed-memory computers
Optimizing Fortran90D/HPF for distributed-memory computers
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Optimizing Fortran 90 Shift Operations on Distributed-Memory Multicomputers
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Loop fusion in high performance Fortran
ICS '98 Proceedings of the 12th international conference on Supercomputing
Eliminating redundancies in sum-of-product array computations
ICS '01 Proceedings of the 15th international conference on Supercomputing
Compilation of Vector Statements of C[] Language for Architectures with Multilevel Memory Hierarchy
Programming and Computing Software
Achieving Scalable Locality with Time Skewing
International Journal of Parallel Programming
On Materializations of Array-Valued Temporaries
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
14.9 TFLOPS three-dimensional fluid simulation for fusion science with HPF on the Earth Simulator
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Optimizing inter-processor data locality on embedded chip multiprocessors
Proceedings of the 5th ACM international conference on Embedded software
Compiler-directed proactive power management for networks
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
2D data locality: definition, abstraction, and application
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
The Journal of Supercomputing
On minimizing materializations of array-valued temporaries
ACM Transactions on Programming Languages and Systems (TOPLAS)
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Generation and optimisation of code using Coxeter lattice paths
Proceedings of the 2007 international workshop on Parallel symbolic computation
An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers I
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
The Journal of Supercomputing
Hi-index | 0.00 |
For many Fortran90 and HPF programs performing dense matrix computations, the main computational portion of the program belongs to a class of kernels known as stencils. Stencil computations are commonly used in solving partial differential equations, image processing, and geometric modeling. The efficient handling of such stencils is critical for achieving high performance on distributed-memory machines. Compiling stencils into efficient code is viewed as so important that some companies have built special-purpose compilers for handling them and others have added stencil-recognizers to existing compilers.In this paper we present a general compilation strategy for stencils written using Fortran90 array constructs. Our strategy is capable of optimizing single or multi-statement stencils and is applicable to stencils specified with shift intrinsics or with array-syntax all equally well. The strategy eliminates the need for pattern-recognition algorithms by orchestrating a set of optimizations that address the overhead of both intraprocessor and interprocessor data movement that results from the translation of Fortran90 array constructs. Our experimental results show that code produced by this strategy beats or matches the best code produced by the special-purpose compilers or pattern-recognition schemes that are known to us. In addition, our strategy produces highly optimized code in situations where the others fail, producing several orders of magnitude performance improvement, and thus provides a stencil compilation strategy that is more robust than its predecessors.