Portable and scalable algorithm for irregular all-to-all communication

Authors:
Wenheng Liu;Cho-Li Wang;Viktor K. Prasanna
Affiliations:
Department of EE-Systems, University of Southern California, Los Angeles, California;Department of EE-Systems, University of Southern California, Los Angeles, California;Department of EE-Systems, University of Southern California, Los Angeles, California
Venue:
Journal of Parallel and Distributed Computing
Year:
2002

Citing 16
Cited 2

Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Scalable data parallel implementations of object recognition using geometric hashing

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Optimal multiphase complete exchange on circuit-switched hypercube architectures

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Communication operations on coarse-grained mesh architectures

Parallel Computing
Practical parallel algorithms for personalized communication and integer sorting

Practical parallel algorithms for personalized communication and integer sorting
U-Net: a user-level network interface for parallel and distributed computing

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ScaLAPACK: a portable linear algebra library for distributed memory computers - design issues and performance

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Supporting Irregular Distributions Using Data-Parallel Languages

IEEE Parallel & Distributed Technology: Systems & Technology
Multiphase Complete Exchange on Paragon, SP2, and CS-2

IEEE Parallel & Distributed Technology: Systems & Technology
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
Parallelization of perceptual grouping on distributed memory machines

CAMP '95 Proceedings of the Computer Architectures for Machine Perception
Many-to-many personalized communication with bounded traffic

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Efficient Algorithms for Block-Cyclic Redistribution of Arrays

SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
Portable and scalable algorithms for irregular all-to-all communication

ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Study of interoperability between EFCI and ER switch mechanisms for ABR traffic in an ATM network

ICCCN '95 Proceedings of the 4th International Conference on Computer Communications and Networks

A Message Scheduling Scheme for All-to-All Personalized Communication on Ethernet Switched Clusters

IEEE Transactions on Parallel and Distributed Systems
Fine-Grained Data Distribution Operations for Particle Codes

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

In irregular all-to-all communication, messages are exchanged between every pair of processors. The message sizes vary from processor to processor and are known only at run time. This is a fundamental communication primitive in parallelizing irregularly structured scientific computations. Our algorithm reduces the total number of message start-ups. It also reduces node contention by smoothing out the lengths of the messages communicated. As compared to the earlier approaches, our algorithm provides deterministic performance and also reduces the buffer space at the nodes during message passing. The performance of the algorithm is characterised using a simple communication model of high-performance computing (HPC) platforms. We show the implementation on T3D and SP2 using C and the message passing interface standard. These can be easily ported to other HPC platforms. The results show the effectiveness of the proposed technique as well as the interplay among the machine size, the variance in message length, and the network interface.