Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Interprocedural compilation of Fortran D for MIMD distributed-memory machines
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
GIVE-N-TAKE—a balanced code placement framework
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Compiling Fortran 90D/HPF for distributed memory MIMD computers
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Interprocedural partial redundancy elimination and its application to distributed memory compilation
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Compiler optimizations for eliminating barrier synchronization
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Interprocedural compilation of irregular applications for distributed memory machines
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Global communication analysis and optimization
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
IEEE Transactions on Parallel and Distributed Systems
A Unified Data-Flow Framework for Optimizing Communication
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
Because of the increasing computational power of workstations and the PCs, the peak processing power of clusters of workstations has been increasing at a rapid pace. However, the sustained performance on a variety of applications lags far behind, because these systems offer lower communication performance. In this paper, we focus on improving the communication performance of the applications run on the clusters through aggressive compiler optimizations. We present a general interprocedural technique for performing communication optimizations across procedure boundaries. Our technique uses the result of local analysis to model the communication as a communication loop, and then performs flow-sensitive interprocedural data-flow analysis to avoid redundant communication, and to perform communication aggregation. Our experimental results and the projected analysis on the clusters shows that aggressive communication optimizations from compilers are very important for system with low communication performance and high computational power.