Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
ROMM routing on mesh and torus networks
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Worst-case traffic for oblivious routing functions
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Universal schemes for parallel communication
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
A large scale, homogeneous, fully distributed parallel machine, I
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Dynamic Voltage Scaling with Links for Power Optimization of Interconnection Networks
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Power-driven Design of Router Microarchitectures in On-chip Networks
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
3D Processing Technology and Its Impact on iA32 Microprocessors
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks
Proceedings of the 32nd annual international symposium on Computer Architecture
Demystifying 3D ICs: The Pros and Cons of Going Vertical
IEEE Design & Test
Design and Management of 3D Chip Multiprocessors Using Network-in-Memory
Proceedings of the 33rd annual international symposium on Computer Architecture
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
The increasing viability of 3-D silicon integration technology has opened new opportunities for chip architecture innovations. One direction is in the extension of 2-D mesh-based tiled chip-multiprocessor architectures into three dimensions. This paper focuses on efficient routing algorithms for such 3-D mesh networks. Existing routing algorithms suffer from either poor worst-case throughput (DOR, ROMM) or poor latency (VAL). Although the minimal routing algorithm O1TURN proposed in already achieves near-optimal worst-case throughput for 2-D mesh networks, the optimality result does not extend to higher dimensions. For 3-D and higher dimensional meshes, the worst-case throughput of O1TURN degrades tremendously. The main contribution of this paper is a new oblivious routing algorithm for 3-D mesh networks called randomized partially-minimal (RPM) routing. RPM provably achieves optimal worst-case throughput for 3-D meshes when the network radix k is even and within a factor of 1/k2 of optimal worst-case throughput when k is odd. Finally, whereas VAL achieves optimal worst-case throughput at a penalty factor of 2 in average latency over DOR, RPM achieves (near) optimal worst-case throughput with a much smaller factor of 1.33. For practical asymmetric 3-D mesh configurations where the number of device layers are fewer than the number of tiles along the edge of a layer, the average latency of RPM reduces to just a factor of 1.11 to 1.19 of DOR. Additionally, a variant of RPM called randomized minimal first (RMF) routing is proposed, which leverages the inherent load-balancing properties of the network traffic to further reduce packet latency without compromising throughput.