Global communication analysis and optimization

  • Authors:
  • Soumen Chakrabarti;Manish Gupta;Jong-Deok Choi

  • Affiliations:
  • Computer Science Division, U. C. Berkeley, CA;IBM T.J. Watson Research Center, Yorktown Heights, P.O. Box, 704, NY;IBM T.J. Watson Research Center, Yorktown Heights, P.O. Box, 704, NY

  • Venue:
  • PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reducing communication cost is crucial to achieving good performance on scalable parallel machines. This paper presents a new compiler algorithm for global analysis and optimization of communication in data-parallel programs. Our algorithm is distinct from existing approaches in that rather than handling loop-nests and array references one by one, it considers all communication in a procedure and their interactions under different placements before making a final decision on the placement of any communication. It exploits the flexibility resulting from this advanced analysis to eliminate redundancy, reduce the number of messages, and reduce contention for cache and communication buffers, all in a unified framework. In contrast, single loop-nest analysis often retains redundant communication, and more aggressive dataflow analysis on array sections can generate too many messages or cache and buffer contention. The algorithm has been implemented in the IBM pHPF compiler for High Performance Fortran. During compilation, the number of messages per processor goes down by as much as a factor of nine for some HPF programs. We present performance results for the IBM SP2 and a network of Sparc workstations (NOW) connected by a Myrinet switch. In many cases, the communication cost is reduced by a factor of two.