A framework for the design, synthesis and cycle-accurate simulation of multiprocessor networks

  • Authors:
  • Raymond R. Hoare;Zhu Ding;Shenchih Tung;Rami Melhem;Alex K. Jones

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of Pittsburgh, 3700 O'Hara Street, 348 Benedum Hall, Pittsburgh, PA 15261, USA;Department of Electrical and Computer Engineering, University of Pittsburgh, 3700 O'Hara Street, 348 Benedum Hall, Pittsburgh, PA 15261, USA;Department of Electrical and Computer Engineering, University of Pittsburgh, 3700 O'Hara Street, 348 Benedum Hall, Pittsburgh, PA 15261, USA;Department of Computer Science, University of Pittsburgh, 6137 Sennott Square, Pittsburgh, PA 15261, USA;Department of Electrical and Computer Engineering, University of Pittsburgh, 3700 O'Hara Street, 348 Benedum Hall, Pittsburgh, PA 15261, USA

  • Venue:
  • Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a framework for the design, synthesis and cycle-accurate simulation for parallel computing networks of 128+ processors. In order to accurately characterize the network, we present a bottom-up design methodology in which each of the components are designed using a hardware description language and synthesized to an FPGA for performance estimation of the final ASIC implementation. The components are then integrated to form a parallel computing network and simulated using a cycle-accurate simulator with network traffic described by command files. This enabled us to simulate various switching techniques, three of which are presented in this paper: wormhole switching, circuit switching and a newly introduced technique called predictive circuit switching. In our experiments, four different representational traffics are generated for our simulation and, to show the flexibility of this model, we vary the cable lengths and thus their latency for all four test cases. Our results show that this hardware design, synthesis and cycle-accurate simulation methodology provides a useful method for evaluating design tradeoffs in parallel networks. A non-blocking queue, with up to 128 internal queues, and a real-time bandwidth scheduler, for up to 128 ports, were designed in hardware with hardware synthesis results presented. From our network simulation results, we conclude that predictive circuit switching exceeds the performance of packet switching for highly predictable traffic, like collective communications, and for heavily loaded unpredictable traffic with small packet sizes. As expected, predictive circuit switching significantly underperforms both packet and circuit switching for unpredictable traffic.