Optimizing User-Level Communication Patterns on the Fujitsu AP3000

Authors:
Jeremy Dawson;Peter Strazdins
Affiliations:
-;-
Venue:
IWCC '99 Proceedings of the 1st IEEE Computer Society International Workshop on Cluster Computing
Year:
1999

Citing 0
Cited 5

Optimizing communications of data parallel programs in scalable cluster systems

GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
Localization techniques for cluster-based data grid

ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
Localized communications of data parallel programs on multi-cluster grid systems

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Optimizations of data distribution localities in cluster grid environments

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part IV
Efficient selection strategies towards processor reordering techniques for improving data locality in heterogeneous clusters

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present techniques and algorithms to improve the performance of various communication patterns on message-passing platforms where, for reasons of safety, user-level communications must be buffered in (special) memory on both the send and the receive. These algorithms can not only minimize message copying but over-lap the copying to/from the special memory with the actual transfer, enabling full bandwidth to be achieved. These patterns include tree broadcast and reductions, (ring-based) multiple broadcasts and reductions, pipelined broadcast and buffered point-to-point sends. In each case, the messages may have a simple stride. All of these patterns are used in dense linear algebra applications, although they are also used in many other contexts.These algorithms are implemented and their performance evaluated on the Fujitsu AP3000, a message passing multicomputer having many characteristics of the cluster model. Some aspects, such as the performance characteristics of the special memory, are specific to the AP3000; however, the algorithms still apply to any platform using a similar mode of user level communications. Worthwhile performance increases are obtained, especially for patterns involving moderate-large number of processors.