Dynamic reconfiguration of optically interconnected networks with time-division multiplexing
Journal of Parallel and Distributed Computing
Scheduling algorithms for input-queued cell switches
Scheduling algorithms for input-queued cell switches
PP-MESS-SIM: A Flexible and Extensible Simulator for Evaluating Multicomputer Networks
IEEE Transactions on Parallel and Distributed Systems
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
Advances in Network Simulation
Computer
Tiny Tera: A Packet Switch Core
IEEE Micro
Algorithms for Supporting Compiled Communication
IEEE Transactions on Parallel and Distributed Systems
Predicting Multiprocessor Memory Access Patterns with Learning Models
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Modeling the Communication Performance of the IBM SP2
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
CANPC '98 Proceedings of the Second International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Modeling the Communication Behavior of the Intel Paragon
MASCOTS '97 Proceedings of the 5th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
High-level Architectural Simulation of the Torus Routing Chip
IVC '97 Proceedings of the 1997 IEEE International Verilog HDL Conference (IVC '97)
From VHDL Register Transfer Level to SystemC Transaction Level Modeling: A Comparative Case Study
SBCCI '03 Proceedings of the 16th symposium on Integrated circuits and systems design
Scalable electronic packet switches
IEEE Journal on Selected Areas in Communications
A two-stage hardware scheduler combining greedy and optimal scheduling
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
This paper introduces a framework for the design, synthesis and cycle-accurate simulation for parallel computing networks of 128+ processors. In order to accurately characterize the network, we present a bottom-up design methodology in which each of the components are designed using a hardware description language and synthesized to an FPGA for performance estimation of the final ASIC implementation. The components are then integrated to form a parallel computing network and simulated using a cycle-accurate simulator with network traffic described by command files. This enabled us to simulate various switching techniques, three of which are presented in this paper: wormhole switching, circuit switching and a newly introduced technique called predictive circuit switching. In our experiments, four different representational traffics are generated for our simulation and, to show the flexibility of this model, we vary the cable lengths and thus their latency for all four test cases. Our results show that this hardware design, synthesis and cycle-accurate simulation methodology provides a useful method for evaluating design tradeoffs in parallel networks. A non-blocking queue, with up to 128 internal queues, and a real-time bandwidth scheduler, for up to 128 ports, were designed in hardware with hardware synthesis results presented. From our network simulation results, we conclude that predictive circuit switching exceeds the performance of packet switching for highly predictable traffic, like collective communications, and for heavily loaded unpredictable traffic with small packet sizes. As expected, predictive circuit switching significantly underperforms both packet and circuit switching for unpredictable traffic.