A Framework for Collective Personalized Communication

Authors:
Laxmikant V. Kalé;Sameer Kumar;Krishnan Varadarajan
Affiliations:
-;-;-
Venue:
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Year:
2003

Citing 0
Cited 16

Message Scheduling for All-to-All Personalized Communication on Ethernet Switched Clusters

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Automatic generation and tuning of MPI collective communication routines

Proceedings of the 19th annual international conference on Supercomputing
Collective communication on architectures that support simultaneous communication over multiple links

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance evaluation of adaptive MPI

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
A Message Scheduling Scheme for All-to-All Personalized Communication on Ethernet Switched Clusters

IEEE Transactions on Parallel and Distributed Systems
Optimizing a conjugate gradient solver with non-blocking collective operations

Parallel Computing
Implementation and performance analysis of non-blocking collective operations for MPI

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Multicast communication in wormhole-routed 2D torus networks with hamiltonian cycle model

Journal of Systems Architecture: the EUROMICRO Journal
Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A case for non-blocking collective operations

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Optimizing a conjugate gradient solver with non-blocking collective operations

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
High performance RDMA based all-to-all broadcast for infiniband clusters

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Improved point-to-point and collective communication performance with output-queued high-radix routers

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Blue matter: strong scaling of molecular dynamics on blue gene/l

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Fast and efficient total exchange on two clusters

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
A case for standard non-blocking collective operations

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores collective personalized communication. For example, in all-to-all personalized communication (AAPC), each processor sends a distinct message to every other processor. However, for many applications, the collective communication pattern is many-to-many, where each processor sends a distinct message to a subset of processors. Inthis paper we first present strategies that reduce per-message cost to optimize AAPC. We then present performance results of these strategies in both all-to-all and many-to-many scenarios. These strategies are implemented in a flexible, asynchronous library with a non-blocking interface, and a message-driven runtime system. This allows the collective communication to run concurrently with the application, if desired. As a result the computational overhead of the communication is substantially reduced, at least on machines such as PSC Lemieux, which sport a co-processor capable of remote DMA. We demonstrate the advantages of our framework with performance results on several benchmarks and applications.