Fat-trees: universal networks for hardware-efficient supercomputing
IEEE Transactions on Computers
A Group-Theoretic Model for Symmetric Interconnection Networks
IEEE Transactions on Computers
The Stanford Dash Multiprocessor
Computer
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Linear Recursive Networks and Their Applications in Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
A generic architecture for on-chip packet-switched interconnections
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
Bused Hypercubes and Other Pin-Optimal Networks
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
The Crossed Cube Architecture for Parallel Computation
IEEE Transactions on Parallel and Distributed Systems
A survey of techniques for energy efficient on-chip communication
Proceedings of the 40th annual Design Automation Conference
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Programmable Stream Processors
Computer
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
Area - Time - Power and Design effort: the basic tradeoffs in Application Specific Systems
ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
Hi-index | 0.00 |
This paper proposes a bus-based cube-type network, called psi-cube, that alleviates the two problems, long wires and a limited number of I/O pins, against the on-chip systems through a small diameter and dynamic clusters, respectively. The 2^n-node psi-cube is organized on the sets of node-partitions produced with an extended n-bit Hamming code @j(n,k) [M. Takesue, @J-Cubes: recursive bused fat-hypercubes for multilevel snoopy caches, in: Proceedings of the International Symposium on Parallel Architectures, Algorithms, and Networks, IEEE CS Press, 1999, pp. 62-67] if we connect the nodes in each partition to the bus owned by the leader of the partition. Owing to the routing between the leaders separated by the distance of 1-3, the diameter equals @?n/2@? if n2^p-1 or @?n/2@? otherwise. The maximum bus length is O(2^p^-^1) or O(2^k^-^1) when the psi-cube is mapped onto an array. We dynamically produce separate sets of clusters for different off-chip targets such as memory blocks, so the traffic to the leaders of clusters is much smaller than in static clusters fixed in hardware. From simulation results, the psi-cube outperforms over the mesh if the bus delay is less than 4 times the mesh link's, and the dynamic clusters increase the psi-cube bandwidth by over 60%.