Hypercube algorithms and implementations
SIAM Journal on Scientific and Statistical Computing
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Comparison of two-dimensional FFT methods on the hypercube
C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Optimizing tridiagonal solvers for alternating direction methods on Boolean cube multiprocessors
SIAM Journal on Scientific and Statistical Computing
A bridging model for parallel computation
Communications of the ACM
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient communication primitives on hypercubes
Concurrency: Practice and Experience
The IBM external user interface for scalable parallel systems
Parallel Computing - Special issue: message passing interfaces
Methods and problems of communication in usual networks
Proceedings of the international workshop on Broadcasting and gossiping 1990
Designing broadcasting algorithms in the Postal Model for message-passing systems
Proceedings of the 4th ACM symposium on Parallel algorithms and architectures
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
The cube-connected cycles: a versatile network for parallel computation
Communications of the ACM
Fault-Tolerant Meshes and Hypercubes with Minimal Numbers of Spares
IEEE Transactions on Computers
The Hierarchical Factor Algorithm for All-to-All Communication (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Contention-Aware Communication Schedule for High-Speed Communication
Cluster Computing
Performance Analysis of MPI Collective Operations
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Automatic generation and tuning of MPI collective communication routines
Proceedings of the 19th annual international conference on Supercomputing
Performance Evaluation of Allgather Algorithms On Terascale Linux Cluster with Fast Ethernet
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Exchanging messages of different sizes
Journal of Parallel and Distributed Computing
STAR-MPI: self tuned adaptive routines for MPI collective operations
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance analysis of MPI collective operations
Cluster Computing
An efficient MPI_allgather for grids
Proceedings of the 16th international symposium on High performance distributed computing
A study of process arrival patterns for MPI collective operations
Proceedings of the 21st annual international conference on Supercomputing
Performance without pain = productivity: data layout and collective communication in UPC
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Move-optimal gossiping among mobile agents
Theoretical Computer Science
Implications of application usage characteristics for collective communication offload
International Journal of High Performance Computing and Networking
A Simple, Pipelined Algorithm for Large, Irregular All-gather Problems
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Bandwidth efficient all-to-all broadcast on switched clusters
International Journal of Parallel Programming
Efficient high performance collective communication for the cell blade
Proceedings of the 23rd international conference on Supercomputing
A study of process arrival patterns for MPI collective operations
International Journal of Parallel Programming
Process Arrival Pattern and Shared Memory Aware Alltoall on InfiniBand
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Process Mapping for MPI Collective Communications
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Scalable communication protocols for dynamic sparse data exchange
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Modeling advanced collective communication algorithms on cell-based systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
A Pipelined Algorithm for Large, Irregular All-Gather Problems
International Journal of High Performance Computing Applications
Assessing contention effects on MPI_alltoall communications
GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
Optimal moves for gossiping among mobile agents
SIROCCO'07 Proceedings of the 14th international conference on Structural information and communication complexity
Accelerating parallel analysis of scientific simulation data via Zazen
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Optimizing collective communication on multicores
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Efficient RDMA-based multi-port collectives on multi-rail QsNetII clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Collective operations in NEC's high-performance MPI libraries
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Kernel-based offload of collective operations: implementation, evaluation and lessons learned
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Efficient allgather for regular SMP-Clusters
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Efficient shared memory and RDMA based design for MPI_Allgather over infiniband
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
High performance RDMA based all-to-all broadcast for infiniband clusters
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
An optimal broadcast algorithm adapted to SMP clusters
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Faster topology-aware collective algorithms through non-minimal communication
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Congestion avoidance on manycore high performance computing systems
Proceedings of the 26th ACM international conference on Supercomputing
Composable, non-blocking collective operations on power7 IH
Proceedings of the 26th ACM international conference on Supercomputing
High-performance RMA-based broadcast on the intel SCC
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
FFTs and multiple collective communication on multiprocessor-node architectures
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Fast and efficient total exchange on two clusters
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Open issues in MPI implementation
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Bandwidth-optimal all-to-all exchanges in fat tree networks
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
We present efficient algorithms for two all-to-all communication operations in message-passing systems: index (or all-to-all personalized communication) and concatenation (or all-to-all broadcast). We assume a model of a fully connected message-passing system, in which the performance of any point-to-point communication is independent of the sender-receiver pair. We also assume that each processor has k驴 1 ports, through which it can send and receive k messages in every communication round. The complexity measures we use are independent of the particular system topology and are based on the communication start-up time, and on the communication bandwidth.In the index operation among n processors, initially, each processor has n blocks of data, and the goal is to exchange the ith block of processor j with the jth block of processor i. We present a class of index algorithms that is designed for all values of n and that features a trade-off between the communication start-up time and the data transfer time. This class of algorithms includes two special cases: an algorithm that is optimal with respect to the measure of the start-up time, and an algorithm that is optimal with respect to the measure of the data transfer time. We also present experimental results featuring the performance tuneability of our index algorithms on the IBM SP-1 parallel system.In the concatenation operation, among n processors, initially, each processor has one block of data, and the goal is to concatenate the n blocks of data from the n processors, and to make the concatenation result known to all the processors. We present a concatenation algorithm that is optimal, for most values of n, in the number of communication rounds and in the amount of data transferred.