Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem

Authors:
Darius Buntinas;Guillaume Mercier;William Gropp
Affiliations:
Argonne National Laboratory, USA;Argonne National Laboratory, USA;Argonne National Laboratory, USA
Venue:
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Year:
2006

Citing 0
Cited 27

Feasibility study of MPI implementation on the heterogeneous multi-core cell BE™ architecture

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Thread-safety in an MPI implementation: Requirements and analysis

Parallel Computing
Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem

Parallel Computing
Virtual machine aware communication libraries for high performance computing

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
SMARTMAP: operating system support for efficient data sharing among processes on a multi-core processor

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A Buffered-Mode MPI Implementation for the Cell BETM Processor

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
A Prototype Implementation of MPI for SMARTMAP

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A distributed Key Message algorithm to optimize the communication in clusters

Parallel Computing
MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Exploiting Direct Access Shared Memory for MPI On Multi-Core Processors

International Journal of High Performance Computing Applications
Optimizing a parallel runtime system for multicore clusters: a case study

Proceedings of the 2010 TeraGrid Conference
Scalability of communicators and groups in MPI

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
A GPGPU transparent virtualization component for high performance computing clouds

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Implementing MPI on windows: comparison with common approaches on Unix

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Locality and topology aware intra-node communication among multicore CPUs

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
A uGNI-based MPICH2 nemesis network module for the cray XE

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Performance evaluation of thread-based MPI in shared memory

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Implementation and shared-memory evaluation of MPICH2 over the nemesis communication subsystem

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Improving communication latency with the write-only architecture

Journal of Parallel and Distributed Computing
A synchronous mode MPI implementation on the cell BETM architecture

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
An integrated runtime scheduler for MPI

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Using DVFS to optimize time warp simulations

Proceedings of the Winter Simulation Conference
Redesigning MPI shared memory communication for large multi-core architecture

Computer Science - Research and Development
On the performance of concurrent transfers in collective algorithms

Proceedings of the 20th European MPI Users' Group Meeting
Proposing a new task model towards many-core architecture

Proceedings of the First International Workshop on Many-core Embedded Systems
Evaluation of messaging middleware for high-performance cloud computing

Personal and Ubiquitous Computing
An integrated fine-grain runtime system for MPI

Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new low-level communication subsystem called Nemesis. Nemesis has been designed and implemented to be scalable and efficient both in the intranode communication context using shared-memory and in the internode communication case using high-performance networks and is natively multimethod-enabled. Nemesis has been integrated in MPICH2 as a CH3 channel and delivers better performance than other dedicated communication channels in MPICH2. Furthermore, the resulting MPICH2 architecture outperforms other MPI implementations in point-to-point benchmarks.