The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer

  • Authors:
  • Sameer Kumar;Gabor Dozsa;Gheorghe Almasi;Philip Heidelberger;Dong Chen;Mark E. Giampapa;Michael Blocksome;Ahmad Faraj;Jeff Parker;Joseph Ratterman;Brian Smith;Charles J. Archer

  • Affiliations:
  • IBM, Yorktown Heights, NY, USA;IBM, Yorktown Heights, NY, USA;IBM, Yorktown Heights, NY, USA;IBM, Yorktown Heights, NY, USA;IBM, Yorktown Heights, NY, USA;IBM, Yorktown Heights, NY, USA;IBM, Rochester, MN, USA;IBM, Rochester, MN, USA;IBM, Rochester, MN, USA;IBM, Rochester, MN, USA;IBM, Rochester, MN, USA;IBM, Rochester, MN, USA

  • Venue:
  • Proceedings of the 22nd annual international conference on Supercomputing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present the architecture of the Deep Computing Messaging Framework (DCMF), a message passing runtime designed for the Blue Gene/P machine and other HPC architectures. DCMF has been designed to easily support several programming paradigms such as the Message Passing Interface (MPI), Aggregate Remote Memory Copy Interface (ARMCI), Charm++, and others. This support is made possible as DCMF provides an application programming interface (API) with active messages and non-blocking collectives. DCMF is being open sourced and has a layered component based architecture with multiple levels of abstraction, allowing the members of the community to contribute new components to its design at the various layers. The DCMF runtime can be extended to other architectures through the development of architecture specific implementations of interface classes. The production DCMF runtime on Blue Gene/P takes advantage of the direct memory access (DMA) hardware to offload message passing work and achieve good overlap of computation and communication. We take advantage of the fact that the Blue Gene/P node is a symmetric multi-processor with four cache-coherent cores and use multi-threading to optimize the performance on the collective network. We also present a performance evaluation of the DCMF runtime on Blue Gene/P and show that it delivers performance close to hardware limits.