An HPF compiler for the IBM SP2

Authors:
Manish Gupta;Sam Midkiff;Edith Schonberg;Ven Seshadri;David Shields;Ko-Yang Wang;Wai-Mee Ching;Ton Ngo
Affiliations:
IBM T.J. Watson Research, P.O. Box 704, Yorktown Heights, NY;IBM T.J. Watson Research, P.O. Box 704, Yorktown Heights, NY;IBM T.J. Watson Research, P.O. Box 704, Yorktown Heights, NY;IBM Software Solutions Division, 1150 Eglinton Ave. East, North York, Ontario, Canada, M3C 1V7;IBM T.J. Watson Research, P.O. Box 704, Yorktown Heights, NY;IBM T.J. Watson Research, P.O. Box 704, Yorktown Heights, NY;IBM T.J. Watson Research, P.O. Box 704, Yorktown Heights, NY;IBM T.J. Watson Research, P.O. Box 704, Yorktown Heights, NY
Venue:
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Year:
1995

Citing 15
Cited 40

Process decomposition through locality of reference

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
A methodology for high-level synthesis of communication on multicomputers

ICS '92 Proceedings of the 6th international conference on Supercomputing
The high performance Fortran handbook

The high performance Fortran handbook
A static parameter based performance prediction tool for parallel programs

ICS '93 Proceedings of the 7th international conference on Supercomputing
GIVE-N-TAKE—a balanced code placement framework

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
The Paradigm Compiler for Distributed-Memory Multicomputers

Computer
Compiling Communication-Efficient Programs for Massively Parallel Machines

IEEE Transactions on Parallel and Distributed Systems
Compiling Global Name-Space Parallel Loops for Distributed Execution

IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Visualizing the execution of High Performance Fortran (HPF) programs

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
An Overview of a Compiler for Scalable Parallel Machines

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A Compilation Approach for Fortran 90D/ HPF Compilers

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A Unified Data-Flow Framework for Optimizing Communication

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing

Global communication analysis and optimization

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Static analysis to reduce synchronization costs in data-parallel programs

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Unified Framework for Optimizing Communication in Data-Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers

IEEE Transactions on Parallel and Distributed Systems
On the Automatic Parallelization of the Perfect Benchmarks®

IEEE Transactions on Parallel and Distributed Systems
Using integer sets for data-parallel program analysis and optimization

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
An efficient uniform run-time scheme for mixed regular-irregular applications

ICS '98 Proceedings of the 12th international conference on Supercomputing
Loop fusion in high performance Fortran

ICS '98 Proceedings of the 12th international conference on Supercomputing
High-level semantic optimization of numerical codes

ICS '99 Proceedings of the 13th international conference on Supercomputing
A global communication optimization technique based on data-flow analysis and linear algebra

ACM Transactions on Programming Languages and Systems (TOPLAS)
Minimizing Data and Synchronization Costs in One-Way Communication

IEEE Transactions on Parallel and Distributed Systems
A comparative study of the NAS MG benchmark across parallel languages and architectures

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A framework for global communication analysis of optimizations

Compiler optimizations for scalable parallel systems
Advanced code generation for high performance Fortran

Compiler optimizations for scalable parallel systems
Compiling stencils in high performance Fortran

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Simplifying Control Flow in Compiler-Generated Parallel Code

International Journal of Parallel Programming
Fortran 90 in CSE: A Case Study

IEEE Computational Science & Engineering
On Privatization of Variables for Data-Parallel Execution

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Enhancing Software DSM for Compiler-Parallelized Applications

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Pipelining Wavefront Computations: Experiences and Performance

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Development of large scale high performance applications with a parallelizing compiler

Practical parallel computing
The design and implementation of a parallel array operator for the arbitrary remapping of data

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
References

Sourcebook of parallel computing
Effective communication coalescing for data-parallel applications

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Trust but verify: monitoring remotely executing programs for progress and correctness

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Combined compile-time and runtime-driven, pro-active data movement in software DSM systems

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Communication Optimizations for Fine-Grained UPC Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Optimizing irregular shared-memory applications for distributed-memory systems

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Shared memory programming for large scale machines

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
On minimizing materializations of array-valued temporaries

ACM Transactions on Programming Languages and Systems (TOPLAS)
The rise and fall of High Performance Fortran: an historical object lesson

Proceedings of the third ACM SIGPLAN conference on History of programming languages
Shape cliques

ACM SIGAPL APL Quote Quad
SPRINT: a tool to generate concurrent transaction-level models from sequential code

EURASIP Journal on Applied Signal Processing
Performance portable optimizations for loops containing communication operations

Proceedings of the 22nd annual international conference on Supercomputing
Automatic Transformation for Overlapping Communication and Computation

NPC '08 Proceedings of the IFIP International Conference on Network and Parallel Computing
MPI-aware compiler optimizations for improving communication-computation overlap

Proceedings of the 23rd international conference on Supercomputing
Pipelined parallelization in HPF programs on the earth simulator

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Automatic Parallelization in a Binary Rewriter

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Message strip-mining heuristics for high speed networks

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
A hybrid approach of OpenMP for clusters

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe pHPF, an research prototype HPF compiler for the IBM SP series parallel machines. The compiler accepts as input Fortran 90 and Fortran 77 programs, augmented with HPF directives; sequential loops are automatically parallelized. The compiler supports symbolic analysis of expressions. This allows parameters such as the number of processors to be unknown at compile-time without significantly affecting performance. Communication schedules and computation guards are generated in a parameterized form at compile-time. Several novel optimizations and improved versions of well-known optimizations have been implemented in pHPF to exploit parallelism and reduce communication costs. These optimizations include elimination of redundant communication using data-availability analysis; using collective communication; new techniques for mapping scalar variables; coarse-grain wavefronting; and communication reduction in multi-dimensional shift communications. We present experimental results for some well-known benchmark routines. The results show the effectiveness of the compiler in generating efficient code for HPF programs.