Factor-Join: A Unique Approach to Compiling Array Languages for Parallel Machines
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
The implementation and evaluation of fusion and contraction in array languages
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Problem space promotion and its evaluation as a technique for efficient parallel computation
ICS '99 Proceedings of the 13th international conference on Supercomputing
ZPL: A Machine Independent Programming Language for Parallel Computers
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
A comparative study of the NAS MG benchmark across parallel languages and architectures
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Automatic nonblocking communication for partitioned global address space programs
Proceedings of the 21st annual international conference on Supercomputing
Performance portable optimizations for loops containing communication operations
Proceedings of the 22nd annual international conference on Supercomputing
Hi-index | 0.00 |
Using a specially constructed machine independent communication optimizer that allows control over optimization selection, we quantify the performance benefit of three well known communication optimizations: redundant communication removal, communication combination, and communication pipelining. The numbers are shown relative to the base performance of benchmark programs using the standard communication optimization of message vectorization. The effects on the number of calls to communication routines, both static and dynamic, are tabulated. We consider a variety of communication primitives including those found in Intel's NX library, PVM and the T3D's SHMEM library. The results show substantial improvement, with two combinations of optimizations being most effective.