The high performance Fortran handbook
The high performance Fortran handbook
Preliminary experiences with the Fortran D compiler
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Software—Practice & Experience
Global communication analysis and optimization
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Using integer sets for data-parallel program analysis and optimization
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Program Analysis of Overlap Area Usage in Self-Similar Parallel Programs
LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
A comparison of automatic parallelization tools/compilers on the SGI origin 2000
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Automatic data and computation decomposition on distributed memory parallel computers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Increasing temporal locality with skewing and recursive blocking
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Compiler Synthesis of Task Graphs for Parallel Program Performance Prediction
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
An Evaluation of Data-Parallel Compiler Support for Line-Sweep Applications
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Data-Parallel Compiler Support for Multipartitioning
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Compilation and Runtime-Optimizations for Software Distributed Shared Memory
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Toward Compiler Support for Scalable Parallelism Using Multipartitioning
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Generalized Multipartitioning for Multi-Dimensional Arrays
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
TEST: a tracer for extracting speculative threads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
The Jrpm system for dynamically parallelizing Java programs
Proceedings of the 30th annual international symposium on Computer architecture
Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
COTS Clusters vs. the Earth Simulator: An Application Study Using IMPACT-3D
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 4 - Volume 05
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Effective automatic parallelization of stencil computations
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Transparent runtime parallelization of the R scripting language
Journal of Parallel and Distributed Computing
Loop Transforming for Reducing Data Alignment on Multi-Core SIMD Processors
Journal of Signal Processing Systems
Hi-index | 0.00 |
With current compilers for High Performance Fortran (HPF), substantial restructuring and hand-optimization may be required to obtain acceptable performance from an HPF port of an existing Fortran application. A key goal of the Rice dHPF compiler project is to develop optimization techniques that can provide consistently high performance for a broad spectrum of scientific applications with minimal restructuring of existing Fortran 77 or Fortran 90 applications. This paper presents four new optimization techniques we developed to support efficient parallelization of codes with minimal restructuring. These optimizations include computation partition selection for loop nests that use privatizable arrays, along with partial replication of boundary computations to reduce communication overhead; communication-sensitive loop distribution to eliminate inner-loop communications; interprocedural selection of computation partitions; and data availability analysis to eliminate redundant communications. We studied the effectiveness of the dHPF compiler, which incorporates these optimizations, in parallelizing serial versions of the NAS SP and BT application benchmarks. We present experimental results comparing the performance of hand-written MPI code for the benchmarks against code generated from HPF using the dHPF compiler and the Portland Group's pghpf compiler. Using the compilation techniques described in this paper we achieve performance within 15% of hand-written MPI code on 25 processors for BT and within 33% for SP. Furthermore, these results are obtained with HPF versions of the benchmarks that were created with minimal restructuring of the serial code (modifying only approximately 5% of the code).