Algorithms for matrix transposition on Boolean N-cube configured ensemble architecture
SIAM Journal on Matrix Analysis and Applications
The parallel Fourier pseudospectral method
Journal of Computational Physics
Multiphase Complete Exchange on Paragon, SP2, and CS-2
IEEE Parallel & Distributed Technology: Systems & Technology
Portable and scalable algorithm for irregular all-to-all communication
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
The complete-exchange communication primitive on a distributed memory multiprocessor calls for every processor to send a message to every other processor, each such message being unique. For circuit-switched hypercube networks there are two well-known schemes for implementing this primitive. Direct exchange minimizes communication volume but maximizes startup costs, while Standard Exchange minimizes startup costs at the price of higher communication volume. This paper analyzes a hybrid, which can be thought of as a sequence of Direct Echange phases, applied to variable-sized subcubes. This paper examines the problem of determining the optimal subcube dimension sizes di for every phase. We show that optimal performance is achieved using some equi-partition, where |di−dj|≤1 for all phases i and j. We study the behavior of the optimal partition as a function of machine communication parameters, hypercube dimension, and message size, and show that the optimal partition can be determined with no more than 2d+1 comparisons. Finally we validate the model empirically, and for certain problem instances observe as much as a factor of two improvement over the other methods.