The performance of the cedar multistage switching network

  • Authors:
  • Josep Torrellas;Zheng Zhang

  • Affiliations:
  • University of Illinois at Urbana-Champaign, IL;University of Illinois at Urbana-Champaign, IL

  • Venue:
  • Proceedings of the 1994 ACM/IEEE conference on Supercomputing
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

While multistage switching networks for vector multiprocessors have been studied extensively, detailed evaluations of their performance are rare. Indeed, analytical models, simulations with pseudo-synthetic loads, studies focused on average-value parameters, and measurements of networks disconnected from the machine, all provide limited information. In this paper, instead, we present an in-depth empirical analysis of a multistage switching network in a realistic setting: we use hardware probes to examine the performance of the omega network of the Cedar shared-memory machine executing real applications. The machine is configured with 16 vector processors.The analysis suggests that the performance of multistage switching networks is limited by traffic non-uniformities. We identify two major non-uniformities that degrade Cedar's performance and are likely to slow down other networks too. The first one is the contention caused by the return messages in a vector access as they converge from the memories to one processor port. This traffic convergence penalizes vector reads and, more importantly, causes tree saturation. The second non-uniformity is the uneven contention delays induced by even a relatively fair scheme to resolve message collisions. Based on our observations, we argue that intuitive optimizations for multistage switching networks may not be cost-effective. Instead, we suggest changes to increase the network bandwidth at the root of the traffic convergence tree and to delay traffic convergence up until the final stages of the network.