Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
Hector: A Hierarchically Structured Shared-Memory Multiprocessor
Computer - Special issue on experimental research in computer architecture
The DASH prototype: implementation and performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Scalable cache consistency for hierarchically structured multiprocessors
The Journal of Supercomputing
Comparative Modeling and Evaluation of CC-NUMA and COMA on Hierarchical Ring Architectures
IEEE Transactions on Parallel and Distributed Systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Design and implementation of the NUMAchine multiprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Hierarchical Ring Network Configuration and Performance Modeling
IEEE Transactions on Computers
Proceedings of the 2001 ACM symposium on Applied computing
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Scalable Parallel Computing: Technology,Architecture,Programming
Scalable Parallel Computing: Technology,Architecture,Programming
Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors
IEEE Transactions on Computers
Performance Evaluation of the Slotted Ring Multiprocessor
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
Performance and Configuration of Hierarchical Ring Networks for Multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-Memory Multiprocessors.
HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
A Performance Comparison of Hierarchical Ring- and Mesh- Connected Multiprocessor Networks
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
On some architectural issues of optical hierarchical ring networks for shared-memory multiprocessors
MPPOI '95 Proceedings of the Second Workshop on Massively Parallel Processing Using Optical Interconnections
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
Performance issues in the design of hierarchical-ring and direct networks for shared-memory multiprocessors
DRACO: optimized CC-NUMA system with novel dual-link interconnections to reduce the memory latency
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Comparison of Mesh and Hierarchical Networks for Multiprocessors
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
On the impact of the migration topology on the Island Model
Parallel Computing
Hi-index | 0.00 |
In multiprocessor systems, interconnection network design is critical for overall system performance. Among the popular interconnection networks, unidirectional ring-based networks have been one of popular choices for high performance large-scale shared memory multiprocessor systems. In this paper, we propose ''Torus Ring'', which is a modified version of two-level hierarchical ring. The Torus Ring has the same complexity as the hierarchical rings, and the only difference is the way it connects the local rings. Compared to hierarchical rings, the Torus Ring helps exploit the memory access locality of application programs more efficiently. It has an advantage over the hierarchical ring when the destination of a packet is the adjacent local ring, especially the backward adjacent local ring. Although we assume that the destination of a network packet is uniformly distributed across the processing nodes, the average number of hops in Torus Ring is equal to that of the hierarchical ring. However, the performance gain of the Torus Ring is expected to increase, due to the memory access locality of the application programs in the real parallel programming environment. In the simulation results, the latency of the interconnection network is reduced by up to 19% and the execution time is reduced by up to 10%, with the moderate ring utilization ratio.