The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer

Authors:
Sameer Kumar;Gabor Dozsa;Gheorghe Almasi;Philip Heidelberger;Dong Chen;Mark E. Giampapa;Michael Blocksome;Ahmad Faraj;Jeff Parker;Joseph Ratterman;Brian Smith;Charles J. Archer
Affiliations:
IBM, Yorktown Heights, NY, USA;IBM, Yorktown Heights, NY, USA;IBM, Yorktown Heights, NY, USA;IBM, Yorktown Heights, NY, USA;IBM, Yorktown Heights, NY, USA;IBM, Yorktown Heights, NY, USA;IBM, Rochester, MN, USA;IBM, Rochester, MN, USA;IBM, Rochester, MN, USA;IBM, Rochester, MN, USA;IBM, Rochester, MN, USA;IBM, Rochester, MN, USA
Venue:
Proceedings of the 22nd annual international conference on Supercomputing
Year:
2008

Citing 10
Cited 34

A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems

IEEE Transactions on Parallel and Distributed Systems
The Quadrics Network: High-Performance Clustering Technology

IEEE Micro
ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Converse: An Interoperable Framework for Parallel Programming

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Shared memory programming for large scale machines

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Overview of the IBM Blue Gene/P project

IBM Journal of Research and Development
Blue Gene/L torus interconnection network

IBM Journal of Research and Development
Design and implementation of message-passing services for the Blue Gene/L supercomputer

IBM Journal of Research and Development
Achieving strong scaling with NAMD on blue Gene/L

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Non-data-communication Overheads in MPI: Analysis on Blue Gene/P

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Architecture of the Component Collective Messaging Interface

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A configurable algorithm for parallel image-compositing applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Scalable communication protocols for dynamic sparse data exchange

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
The Importance of Non-Data-Communication Overheads in MPI

International Journal of High Performance Computing Applications
Architecture of the Component Collective Messaging Interface

International Journal of High Performance Computing Applications
AM++: a generalized active message framework

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Enabling concurrent multithreaded MPI communication on multicore petascale systems

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Network offloaded hierarchical collectives using ConnectX-2's CORE-Direct capabilities

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Parallel zero-copy algorithms for fast Fourier transform and conjugate gradient using MPI datatypes

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Asynchronous PGAS runtime for Myrinet networks

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Extensible PGAS semantics for C++

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Improving communication performance in dense linear algebra via topology aware collectives

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A cluster computer performance predictor for memory scheduling

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
Neighborhood communication paradigm to increase scalability in large-scale dynamic scientific applications

Parallel Computing
A cost-effective heuristic to schedule local and remote memory in cluster computers

The Journal of Supercomputing
Composable, non-blocking collective operations on power7 IH

Proceedings of the 26th ACM international conference on Supercomputing
Collective algorithms for sub-communicators

Proceedings of the 26th ACM international conference on Supercomputing
Performance characterization of global address space applications: a case study with NWChem

Concurrency and Computation: Practice & Experience
Runtime detection and optimization of collective communication patterns

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Compass: a scalable simulator for an architecture for cognitive computing

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A high-productivity task-based programming model for clusters

Concurrency and Computation: Practice & Experience
Performance evaluation and optimization of nested high resolution weather simulations

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Designing energy efficient communication runtime systems: a view from PGAS models

The Journal of Supercomputing
Hardware support for fine-grained event-driven computation in Anton 2

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
LibWater: heterogeneous distributed computing made easy

Proceedings of the 27th international ACM conference on International conference on supercomputing
Expressing graph algorithms using generalized active messages

Proceedings of the 27th international ACM conference on International conference on supercomputing
IBM Blue Gene/Q system software stack

IBM Journal of Research and Development
Optimizing Memory Constrained Environments in Monte Carlo Nuclear Reactor Simulations

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the architecture of the Deep Computing Messaging Framework (DCMF), a message passing runtime designed for the Blue Gene/P machine and other HPC architectures. DCMF has been designed to easily support several programming paradigms such as the Message Passing Interface (MPI), Aggregate Remote Memory Copy Interface (ARMCI), Charm++, and others. This support is made possible as DCMF provides an application programming interface (API) with active messages and non-blocking collectives. DCMF is being open sourced and has a layered component based architecture with multiple levels of abstraction, allowing the members of the community to contribute new components to its design at the various layers. The DCMF runtime can be extended to other architectures through the development of architecture specific implementations of interface classes. The production DCMF runtime on Blue Gene/P takes advantage of the direct memory access (DMA) hardware to offload message passing work and achieve good overlap of computation and communication. We take advantage of the fact that the Blue Gene/P node is a symmetric multi-processor with four cache-coherent cores and use multi-threading to optimize the performance on the collective network. We also present a performance evaluation of the DCMF runtime on Blue Gene/P and show that it delivers performance close to hardware limits.