Data dependence and its application to parallel processing
International Journal of Parallel Programming
An overview of the PTRAN analysis system for multiprocessing
Proceedings of the 1st International Conference on Supercomputing
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Detecting redundant accesses to array data
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
PTRAN—the IBM parallel translation system
Parallel functional languages and compilers
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A methodology for high-level synthesis of communication on multicomputers
ICS '92 Proceedings of the 6th international conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Communication optimization and code generation for distributed memory machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
GIVE-N-TAKE—a balanced code placement framework
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Global code motion/global value numbering
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Interprocedural partial redundancy elimination and its application to distributed memory compilation
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The SP2 high-performance switch
IBM Systems Journal
The communication software and parallel environment of the IBM SP2
IBM Systems Journal
Compiler optimizations for eliminating barrier synchronization
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
An HPF compiler for the IBM SP2
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Compiler reduction of synchronisation in shared virtual memory systems
ICS '95 Proceedings of the 9th international conference on Supercomputing
Static analysis to reduce synchronization costs in data-parallel programs
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Unified Framework for Optimizing Communication in Data-Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Compiling Communication-Efficient Programs for Massively Parallel Machines
IEEE Transactions on Parallel and Distributed Systems
On the Efficient Engineering of Ambitious Program Analysis
IEEE Transactions on Software Engineering
Combining dependence and data-flow analyses to optimize communication
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
A Compilation Approach for Fortran 90D/ HPF Compilers
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Communication optimizations for parallel C programs
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Problem and machine sensitive communication optimization
ICS '98 Proceedings of the 12th international conference on Supercomputing
A global communication optimization technique based on data-flow analysis and linear algebra
ACM Transactions on Programming Languages and Systems (TOPLAS)
Minimizing Data and Synchronization Costs in One-Way Communication
IEEE Transactions on Parallel and Distributed Systems
Global optimization techniques for automatic parallelization of hybrid applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Static Single Assignment Form for Message-Passing Programs
International Journal of Parallel Programming
A framework for global communication analysis of optimizations
Compiler optimizations for scalable parallel systems
Advanced code generation for high performance Fortran
Compiler optimizations for scalable parallel systems
High performance Fortran compilation techniques for parallelizing scientific codes
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Algorithms for Supporting Compiled Communication
IEEE Transactions on Parallel and Distributed Systems
Compiling Several Classes of Communication Patterns on a Multithreaded Architecture
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Accurate Data and Context Management in Message-Passing Programs
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Gilgamesh: a multithreaded processor-in-memory architecture for petaflops computing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
14.9 TFLOPS three-dimensional fluid simulation for fusion science with HPF on the Earth Simulator
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Aggressive communication optimizations for clusters of workstations
Cluster computing
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
A performance analysis of the Berkeley UPC compiler
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Evaluating support for global address space languages on the Cray X1
Proceedings of the 18th annual international conference on Supercomputing
A Complete Compiler Approach to Auto-Parallelizing C Programs for Multi-DSP Systems
IEEE Transactions on Parallel and Distributed Systems
Effective communication coalescing for data-parallel applications
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Communication Optimizations for Fine-Grained UPC Applications
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Compiler-directed channel allocation for saving power in on-chip networks
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Towards a more efficient implementation of OpenMP for clusters via translation to global arrays
Parallel Computing - OpenMp
Shared memory programming for large scale machines
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
The Journal of Supercomputing
Automatic nonblocking communication for partitioned global address space programs
Proceedings of the 21st annual international conference on Supercomputing
Communication optimizations for global multi-threaded instruction scheduling
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Performance portable optimizations for loops containing communication operations
Proceedings of the 22nd annual international conference on Supercomputing
Slicing based code parallelization for minimizing inter-processor communication
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
A heuristic rule of partitioning irregular loop for parallelizing compilers
HPCA'09 Proceedings of the Second international conference on High Performance Computing and Applications
Efficient implementation of OpenMP for clusters with implicit data distribution
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Hi-index | 0.00 |
Reducing communication cost is crucial to achieving good performance on scalable parallel machines. This paper presents a new compiler algorithm for global analysis and optimization of communication in data-parallel programs. Our algorithm is distinct from existing approaches in that rather than handling loop-nests and array references one by one, it considers all communication in a procedure and their interactions under different placements before making a final decision on the placement of any communication. It exploits the flexibility resulting from this advanced analysis to eliminate redundancy, reduce the number of messages, and reduce contention for cache and communication buffers, all in a unified framework. In contrast, single loop-nest analysis often retains redundant communication, and more aggressive dataflow analysis on array sections can generate too many messages or cache and buffer contention. The algorithm has been implemented in the IBM pHPF compiler for High Performance Fortran. During compilation, the number of messages per processor goes down by as much as a factor of nine for some HPF programs. We present performance results for the IBM SP2 and a network of Sparc workstations (NOW) connected by a Myrinet switch. In many cases, the communication cost is reduced by a factor of two.