Optimization of MPI collectives on clusters of large-scale SMP's
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
The implementation of MPI-2 one-sided communication for the NEC SX-5
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Global arrays: a portable "shared-memory" programming model for distributed memory computers
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Scalable NIC-based Reduction on Large-scale Clusters
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Enabling a highly-scalable global address space model for petascale computing
Proceedings of the 7th ACM international conference on Computing frontiers
MPI-2 one-sided usage and implementation for read modify write operations: a case study with HPCC
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A fast and resource-conscious MPI message queue mechanism for large-scale jobs
Future Generation Computer Systems
Hi-index | 0.01 |
One-sided atomic reduction, also known as the accumulate operation, combines atomically a content of the local buffer with data at remote memory location. This operation has been included in the MPI-2 standard a, MPI_Accumulate. The current paper discusses two strategies for implementing one-sided atomic reduction called owner-computes and callercomputes. Performance of these two schemes has been investigated on the HP Alphaserver SC45 and HP zx-2600 clusters both equipped with the Quadrics Elan-3 network.