Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
Network and processor architecture for message-driven computers
VLSI and parallel computation
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
The Stanford Dash Multiprocessor
Computer
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Communication and computation performance of the CM-5
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Minimal adaptive routing with limited injection on Toroidal k-ary n-cubes
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Limits on Interconnection Network Performance
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels
IEEE Transactions on Parallel and Distributed Systems
A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
k -ary n -trees: High Performance Networks for Massively Parallel Architectures
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Accuracy vs. performance in parallel simulation of interconnection networks
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
The Cache Coherence Protocol of the Data Diffusion Machine
PARLE '89 Proceedings of the Parallel Architectures and Languages Europe, Volume I: Parallel Architectures
Performance Evaluation of Adaptive Routing Algorithms for k-ary-n-cubes
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
Congestion-Free Routing on the CM-5 Data Router
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
A Comparison of Input and Output Driven Routers
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Optimal Topology for Distributed Shared-Memory Multiprocessors: Hypercubes Again?
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Realization of video object plane decoder on on-chip network architecture
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
Analyzing the performance of mesh and fat-tree topologies for network on chip design
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Hi-index | 0.00 |
The performance of an interconnection network in a massively parallel architecture is subject to physical constraints whose impact needs to be re-evaluated from time to time. Fat-trees and low dimensional cubes have raised a great interest in the scientific community in the last few years and are emerging standards in the design of interconnection networks for massively parallel computers. In this paper we compare the communication performance of these two classes of interconnection networks using a detailed simulation model. The comparison is made using a set of synthetic benchmarks, taking into account physical constraints, as pin and bandwidth limitations, and the router complexity. In our experiments we consider two networks with 256 nodes, a 16-ary 2-cube and 4-ary 4-tree.