A framework for the design, synthesis and cycle-accurate simulation of multiprocessor networks

Authors:
Raymond R. Hoare;Zhu Ding;Shenchih Tung;Rami Melhem;Alex K. Jones
Affiliations:
Department of Electrical and Computer Engineering, University of Pittsburgh, 3700 O'Hara Street, 348 Benedum Hall, Pittsburgh, PA 15261, USA;Department of Electrical and Computer Engineering, University of Pittsburgh, 3700 O'Hara Street, 348 Benedum Hall, Pittsburgh, PA 15261, USA;Department of Electrical and Computer Engineering, University of Pittsburgh, 3700 O'Hara Street, 348 Benedum Hall, Pittsburgh, PA 15261, USA;Department of Computer Science, University of Pittsburgh, 6137 Sennott Square, Pittsburgh, PA 15261, USA;Department of Electrical and Computer Engineering, University of Pittsburgh, 3700 O'Hara Street, 348 Benedum Hall, Pittsburgh, PA 15261, USA
Venue:
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Year:
2005

Citing 16
Cited 1

Dynamic reconfiguration of optically interconnected networks with time-division multiplexing

Journal of Parallel and Distributed Computing
Scheduling algorithms for input-queued cell switches

Scheduling algorithms for input-queued cell switches
PP-MESS-SIM: A Flexible and Extensible Simulator for Evaluating Multicomputer Networks

IEEE Transactions on Parallel and Distributed Systems
Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach
Advances in Network Simulation

Computer
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Tiny Tera: A Packet Switch Core

IEEE Micro
The Quadrics Network: High-Performance Clustering Technology

IEEE Micro
Algorithms for Supporting Compiled Communication

IEEE Transactions on Parallel and Distributed Systems
Predicting Multiprocessor Memory Access Patterns with Learning Models

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Modeling the Communication Performance of the IBM SP2

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Comparative Characterization of Communication Patterns in Applications Using MPI and Shared Memory on an IBM SP2

CANPC '98 Proceedings of the Second International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Modeling the Communication Behavior of the Intel Paragon

MASCOTS '97 Proceedings of the 5th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
High-level Architectural Simulation of the Torus Routing Chip

IVC '97 Proceedings of the 1997 IEEE International Verilog HDL Conference (IVC '97)
From VHDL Register Transfer Level to SystemC Transaction Level Modeling: A Comparative Case Study

SBCCI '03 Proceedings of the 16th symposium on Integrated circuits and systems design
Scalable electronic packet switches

IEEE Journal on Selected Areas in Communications

A two-stage hardware scheduler combining greedy and optimal scheduling

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a framework for the design, synthesis and cycle-accurate simulation for parallel computing networks of 128+ processors. In order to accurately characterize the network, we present a bottom-up design methodology in which each of the components are designed using a hardware description language and synthesized to an FPGA for performance estimation of the final ASIC implementation. The components are then integrated to form a parallel computing network and simulated using a cycle-accurate simulator with network traffic described by command files. This enabled us to simulate various switching techniques, three of which are presented in this paper: wormhole switching, circuit switching and a newly introduced technique called predictive circuit switching. In our experiments, four different representational traffics are generated for our simulation and, to show the flexibility of this model, we vary the cable lengths and thus their latency for all four test cases. Our results show that this hardware design, synthesis and cycle-accurate simulation methodology provides a useful method for evaluating design tradeoffs in parallel networks. A non-blocking queue, with up to 128 internal queues, and a real-time bandwidth scheduler, for up to 128 ports, were designed in hardware with hardware synthesis results presented. From our network simulation results, we conclude that predictive circuit switching exceeds the performance of packet switching for highly predictable traffic, like collective communications, and for heavily loaded unpredictable traffic with small packet sizes. As expected, predictive circuit switching significantly underperforms both packet and circuit switching for unpredictable traffic.