Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem

Authors:
Darius Buntinas;Guillaume Mercier;William Gropp
Affiliations:
Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA;LaBRI, Université Bordeaux I - INRIA Futurs, France;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA
Venue:
Parallel Computing
Year:
2007

Citing 8
Cited 10

Two algorithms for barrier synchronization

International Journal of Parallel Programming
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
BIP-SMP: high performance message passing over a cluster of commodity SMPs

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Multi-protocol active messages on a cluster of SMP's

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Implementation and Evaluation of MPI on an SMP Cluster

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Managing Concurrent Access for Shared Memory Active Messages

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Data Transfers between Processes in an SMP System: Performance Study and Application to MPI

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing

SMARTMAP: operating system support for efficient data sharing among processes on a multi-core processor

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A Prototype Implementation of MPI for SMARTMAP

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Employing transport layer multi-railing in cluster networks

Journal of Parallel and Distributed Computing
Exploiting Direct Access Shared Memory for MPI On Multi-Core Processors

International Journal of High Performance Computing Applications
Scalable Speculative Parallelization on Commodity Clusters

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
High-performance message-passing over generic Ethernet hardware with Open-MX

Parallel Computing
Building a scalable and portable message-passing library for embedded multicore systems

Proceedings of the 2011 ACM Symposium on Research in Applied Computation
KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework

Journal of Parallel and Distributed Computing
The impact of hybrid-core processors on MPI message rate

Proceedings of the 20th European MPI Users' Group Meeting

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the implementation of MPICH2 over the Nemesis communication subsystem and the evaluation of its shared-memory performance. We describe design issues as well as some of the optimization techniques we employed. We conducted a performance evaluation over shared memory using microbenchmarks. The evaluation shows that MPICH2 Nemesis has very low communication overhead, making it suitable for smaller-grained applications.