Extending Collective Operations with Application Semantics for Improving Multi-Cluster Performance

  • Authors:
  • Lars Ailo Bongo;Otto Anshus;John Markus Bjørndalen;Tore Larsen

  • Affiliations:
  • University of Tromsø;University of Tromsø;University of Tromsø;University of Tromsø

  • Venue:
  • ISPDC '04 Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We identify two ways of increasing the performance of allreduce-style of collective operations in a multi-cluster with large WAN latencies: (i) hiding latency in system noise, and (ii) conditional-allreduce where knowledge about the application is used to reduce the number of WAN messages. In our multicluster, system noise was not large enough to hide the WAN latency. But, the latency could be hidden using conditional-allreduce, since on many iterations only cluster-local values were needed, and many of the values needed from other clusters were prefetched. A speedup of 2.4 was achieved for a microbenchmark. Prefetching introduced a small overhead in the cluster with the slowest hosts.