Process decomposition through locality of reference
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
A methodology for high-level synthesis of communication on multicomputers
ICS '92 Proceedings of the 6th international conference on Supercomputing
The high performance Fortran handbook
The high performance Fortran handbook
A static parameter based performance prediction tool for parallel programs
ICS '93 Proceedings of the 7th international conference on Supercomputing
GIVE-N-TAKE—a balanced code placement framework
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Compiling Communication-Efficient Programs for Massively Parallel Machines
IEEE Transactions on Parallel and Distributed Systems
Compiling Global Name-Space Parallel Loops for Distributed Execution
IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Visualizing the execution of High Performance Fortran (HPF) programs
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
An Overview of a Compiler for Scalable Parallel Machines
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A Compilation Approach for Fortran 90D/ HPF Compilers
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A Unified Data-Flow Framework for Optimizing Communication
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Global communication analysis and optimization
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Static analysis to reduce synchronization costs in data-parallel programs
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Unified Framework for Optimizing Communication in Data-Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers
IEEE Transactions on Parallel and Distributed Systems
On the Automatic Parallelization of the Perfect Benchmarks®
IEEE Transactions on Parallel and Distributed Systems
Using integer sets for data-parallel program analysis and optimization
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
An efficient uniform run-time scheme for mixed regular-irregular applications
ICS '98 Proceedings of the 12th international conference on Supercomputing
Loop fusion in high performance Fortran
ICS '98 Proceedings of the 12th international conference on Supercomputing
High-level semantic optimization of numerical codes
ICS '99 Proceedings of the 13th international conference on Supercomputing
A global communication optimization technique based on data-flow analysis and linear algebra
ACM Transactions on Programming Languages and Systems (TOPLAS)
Minimizing Data and Synchronization Costs in One-Way Communication
IEEE Transactions on Parallel and Distributed Systems
A comparative study of the NAS MG benchmark across parallel languages and architectures
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A framework for global communication analysis of optimizations
Compiler optimizations for scalable parallel systems
Advanced code generation for high performance Fortran
Compiler optimizations for scalable parallel systems
Compiling stencils in high performance Fortran
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Simplifying Control Flow in Compiler-Generated Parallel Code
International Journal of Parallel Programming
Fortran 90 in CSE: A Case Study
IEEE Computational Science & Engineering
On Privatization of Variables for Data-Parallel Execution
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Enhancing Software DSM for Compiler-Parallelized Applications
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Pipelining Wavefront Computations: Experiences and Performance
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Development of large scale high performance applications with a parallelizing compiler
Practical parallel computing
The design and implementation of a parallel array operator for the arbitrary remapping of data
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Sourcebook of parallel computing
Effective communication coalescing for data-parallel applications
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Trust but verify: monitoring remotely executing programs for progress and correctness
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Combined compile-time and runtime-driven, pro-active data movement in software DSM systems
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Communication Optimizations for Fine-Grained UPC Applications
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Optimizing irregular shared-memory applications for distributed-memory systems
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Shared memory programming for large scale machines
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
On minimizing materializations of array-valued temporaries
ACM Transactions on Programming Languages and Systems (TOPLAS)
The rise and fall of High Performance Fortran: an historical object lesson
Proceedings of the third ACM SIGPLAN conference on History of programming languages
ACM SIGAPL APL Quote Quad
SPRINT: a tool to generate concurrent transaction-level models from sequential code
EURASIP Journal on Applied Signal Processing
Performance portable optimizations for loops containing communication operations
Proceedings of the 22nd annual international conference on Supercomputing
Automatic Transformation for Overlapping Communication and Computation
NPC '08 Proceedings of the IFIP International Conference on Network and Parallel Computing
MPI-aware compiler optimizations for improving communication-computation overlap
Proceedings of the 23rd international conference on Supercomputing
Pipelined parallelization in HPF programs on the earth simulator
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Automatic Parallelization in a Binary Rewriter
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Message strip-mining heuristics for high speed networks
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
A hybrid approach of OpenMP for clusters
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Hi-index | 0.00 |
We describe pHPF, an research prototype HPF compiler for the IBM SP series parallel machines. The compiler accepts as input Fortran 90 and Fortran 77 programs, augmented with HPF directives; sequential loops are automatically parallelized. The compiler supports symbolic analysis of expressions. This allows parameters such as the number of processors to be unknown at compile-time without significantly affecting performance. Communication schedules and computation guards are generated in a parameterized form at compile-time. Several novel optimizations and improved versions of well-known optimizations have been implemented in pHPF to exploit parallelism and reduce communication costs. These optimizations include elimination of redundant communication using data-availability analysis; using collective communication; new techniques for mapping scalar variables; coarse-grain wavefronting; and communication reduction in multi-dimensional shift communications. We present experimental results for some well-known benchmark routines. The results show the effectiveness of the compiler in generating efficient code for HPF programs.