MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Converse: An Interoperable Framework for Parallel Programming
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Shared memory programming for large scale machines
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Overview of the IBM Blue Gene/P project
IBM Journal of Research and Development
Blue Gene/L torus interconnection network
IBM Journal of Research and Development
Design and implementation of message-passing services for the Blue Gene/L supercomputer
IBM Journal of Research and Development
Achieving strong scaling with NAMD on blue Gene/L
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Non-data-communication Overheads in MPI: Analysis on Blue Gene/P
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Architecture of the Component Collective Messaging Interface
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A configurable algorithm for parallel image-compositing applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Scalable communication protocols for dynamic sparse data exchange
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
The Importance of Non-Data-Communication Overheads in MPI
International Journal of High Performance Computing Applications
Architecture of the Component Collective Messaging Interface
International Journal of High Performance Computing Applications
AM++: a generalized active message framework
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Enabling concurrent multithreaded MPI communication on multicore petascale systems
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Network offloaded hierarchical collectives using ConnectX-2's CORE-Direct capabilities
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Parallel zero-copy algorithms for fast Fourier transform and conjugate gradient using MPI datatypes
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Asynchronous PGAS runtime for Myrinet networks
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Extensible PGAS semantics for C++
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Improving communication performance in dense linear algebra via topology aware collectives
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A cluster computer performance predictor for memory scheduling
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
A cost-effective heuristic to schedule local and remote memory in cluster computers
The Journal of Supercomputing
Composable, non-blocking collective operations on power7 IH
Proceedings of the 26th ACM international conference on Supercomputing
Collective algorithms for sub-communicators
Proceedings of the 26th ACM international conference on Supercomputing
Performance characterization of global address space applications: a case study with NWChem
Concurrency and Computation: Practice & Experience
Runtime detection and optimization of collective communication patterns
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Compass: a scalable simulator for an architecture for cognitive computing
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A high-productivity task-based programming model for clusters
Concurrency and Computation: Practice & Experience
Performance evaluation and optimization of nested high resolution weather simulations
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Designing energy efficient communication runtime systems: a view from PGAS models
The Journal of Supercomputing
Hardware support for fine-grained event-driven computation in Anton 2
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
LibWater: heterogeneous distributed computing made easy
Proceedings of the 27th international ACM conference on International conference on supercomputing
Expressing graph algorithms using generalized active messages
Proceedings of the 27th international ACM conference on International conference on supercomputing
IBM Blue Gene/Q system software stack
IBM Journal of Research and Development
Optimizing Memory Constrained Environments in Monte Carlo Nuclear Reactor Simulations
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
We present the architecture of the Deep Computing Messaging Framework (DCMF), a message passing runtime designed for the Blue Gene/P machine and other HPC architectures. DCMF has been designed to easily support several programming paradigms such as the Message Passing Interface (MPI), Aggregate Remote Memory Copy Interface (ARMCI), Charm++, and others. This support is made possible as DCMF provides an application programming interface (API) with active messages and non-blocking collectives. DCMF is being open sourced and has a layered component based architecture with multiple levels of abstraction, allowing the members of the community to contribute new components to its design at the various layers. The DCMF runtime can be extended to other architectures through the development of architecture specific implementations of interface classes. The production DCMF runtime on Blue Gene/P takes advantage of the direct memory access (DMA) hardware to offload message passing work and achieve good overlap of computation and communication. We take advantage of the fact that the Blue Gene/P node is a symmetric multi-processor with four cache-coherent cores and use multi-threading to optimize the performance on the collective network. We also present a performance evaluation of the DCMF runtime on Blue Gene/P and show that it delivers performance close to hardware limits.