Bamboo: translating MPI applications to a latency-tolerant, data-driven form

  • Authors:
  • Tan Nguyen;Pietro Cicotti;Eric Bylaska;Dan Quinlan;Scott B. Baden

  • Affiliations:
  • University of California, San Diego, La Jolla, CA;University of California, San Diego, La Jolla, CA;Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA;Center for Advanced Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA;University of California, San Diego, La Jolla, CA

  • Venue:
  • SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present Bamboo, a custom source-to-source translator that transforms MPI C source into a data-driven form that automatically overlaps communication with available computation. Running on up to 98304 processors of NERSC's Hopper system, we observe that Bamboo's overlap capability speeds up MPI implementations of a 3D Jacobi iterative solver and Cannon's matrix multiplication. Bamboo's generated code meets or exceeds the performance of hand optimized MPI, which includes split-phase coding, the method classically employed to hide communication. We achieved our results with only modest amounts of programmer annotation and no intrusive reprogramming of the original application source.