An Evaluation of Two Implementation Strategies for Optimizing One-Sided Atomic Reduction

Authors:
Jarek Nieplocha;Vinod Tipparaju;Edoardo Apra
Affiliations:
Pacific Northwest National Laboratory;Pacific Northwest National Laboratory;Pacific Northwest National Laboratory
Venue:
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Year:
2005

Citing 8
Cited 3

Optimization of MPI collectives on clusters of large-scale SMP's

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
The implementation of MPI-2 one-sided communication for the NEC SX-5

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Global arrays: a portable "shared-memory" programming model for distributed memory computers

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
The Quadrics Network: High-Performance Clustering Technology

IEEE Micro
Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Scalable NIC-based Reduction on Large-scale Clusters

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Microbenchmark Performance Comparison of High-Speed Cluster Interconnects

IEEE Micro

Enabling a highly-scalable global address space model for petascale computing

Proceedings of the 7th ACM international conference on Computing frontiers
MPI-2 one-sided usage and implementation for read modify write operations: a case study with HPCC

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A fast and resource-conscious MPI message queue mechanism for large-scale jobs

Future Generation Computer Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

One-sided atomic reduction, also known as the accumulate operation, combines atomically a content of the local buffer with data at remote memory location. This operation has been included in the MPI-2 standard a, MPI_Accumulate. The current paper discusses two strategies for implementing one-sided atomic reduction called owner-computes and callercomputes. Performance of these two schemes has been investigated on the HP Alphaserver SC45 and HP zx-2600 clusters both equipped with the Quadrics Elan-3 network.