Design and implementation of the NUMAchine multiprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Optimal Clustering of Hierarchical Hyper-Ring Multicomputers
The Journal of Supercomputing
Hierarchical Ring Network Configuration and Performance Modeling
IEEE Transactions on Computers
Performance and Configuration of Hierarchical Ring Networks for Multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
A Tree Based Router Search Engine Architecture with Single Port Memories
Proceedings of the 32nd annual international symposium on Computer Architecture
A Hybrid Ring/Mesh Interconnect for Network-on-Chip Using Hierarchical Rings for Global Routing
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Modeling and evaluation of ring-based interconnects for Network-on-Chip
Journal of Systems Architecture: the EUROMICRO Journal
Hierarchical circuit-switched NoC for multicore video processing
Microprocessors & Microsystems
Computers and Electrical Engineering
On-chip ring network designs for hard-real time systems
Proceedings of the 21st International conference on Real-Time Networks and Systems
TornadoNoC: A lightweight and scalable on-chip network architecture for the many-core era
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
This paper compares the performance of hierarchical ring- and mesh-connected wormhole routed shared memory multiprocessor networks in a simulation study. Hierarchical rings are interesting alternatives to meshes since i) they can be clocked at faster rates, ii) they can have wider data paths and hence shorter message sizes, iii) they allow addition and removal of processing nodes at arbitrary locations, iv) their topology allows natural exploitation in the spatial locality of application memory access patterns, and v) their topology allows efficient implementation of broadcasts. Our study shows that for workloads with little locality, meshes scale better than ring networks because ring-based systems have limited bisection bandwidth. However, for workloads with some memory access locality, hierarchical rings outperform meshes by 20-40% for system sizes of up to 128 processors. Even with poor access locality, hierarchical rings will outperform meshes for these system sizes if the mesh router buffers are only 1-flit large, and they will outperform meshes in systems with less than 36 processors regardless of mesh router buffer size.