LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimal broadcast and summation in the LogP model
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Efficient algorithms for all-to-all communications in multi-port message-passing systems
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Designing broadcasting algorithms in the Postal Model for message-passing systems
Proceedings of the 4th ACM symposium on Parallel algorithms and architectures
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
MPI-FM: high performance MPI on workstation clusters
Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
Efficient Algorithms for the Reduce-Scatter Operation in LogGP
IEEE Transactions on Parallel and Distributed Systems
Building a high-performance collective communication library
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Vector Prefix and Reduction Computation on Coarse-Grained, Distributed-Memory Parallel Machines
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Document for a Standard Message-Passing Interface
Document for a Standard Message-Passing Interface
Pattern-independent detection of manual collectives in MPI programs
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Hi-index | 0.00 |
We discuss the efficient implementation of the MPI collective operation called reduce-scatter. We describe the implementation issues and the performance characterization of two algorithms for the reduce-scatter that have been proven to be highly efficient in theory under the assumption of fully connected parallel system. A performance comparison with existing mainstream implementations of the operation is presented which confirms the practical advantage of the new algorithms. Experiments show that the two algorithms have different characteristics which make them complementary in providing a performance gain over standard algorithms.