Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
A generic architecture for on-chip packet-switched interconnections
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
Exploring the Design Space of Future CMPs
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
?-Cubes: Recursive Bused Fat-Hypercubes for Multilevel Snoopy Caches
ISPAN '99 Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Interconnect-power dissipation in a microprocessor
Proceedings of the 2004 international workshop on System level interconnect prediction
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
The Psi-Cube: A Bus-Based Cube-Type Network for High-Performance On-Chip Systems
ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Hi-index | 0.00 |
This paper proposes a semi-completely-connected bus, called SKB, to alleviate the long-wire and pin-neck problems against on-chip systems through a small diameter and dynamic clustering. Dynamic clustering allows to reduce the traffic to the per-cluster units such as the global interconnect interface, as compared with the static clustering fixed in hardware. We derive a 2n-node semi-complete (SK) graph from a simple node-partitioning. An SKB is produced from the SK graph when we replace the links incident to a node by a single bus for the node. The diameter of SKB equals 1 (bus step), though the bus length is rather long, O(√2n). Simulation results show that relative to the hypercube with the link delay of 1 clock, the SKB's bandwidth is about 0.97 and 0.14 assuming the bus delay of 1 and 8 clocks, respectively, that increases to about 4.57 and 0.71 with the dynamic clustering.