ROMM routing on mesh and torus networks
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Worst-case traffic for oblivious routing functions
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Universal schemes for parallel communication
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
A large scale, homogeneous, fully distributed parallel machine, I
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Dynamic Voltage Scaling with Links for Power Optimization of Interconnection Networks
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks
Proceedings of the 32nd annual international symposium on Computer Architecture
Demystifying 3D ICs: The Pros and Cons of Going Vertical
IEEE Design & Test
Design and Management of 3D Chip Multiprocessors Using Network-in-Memory
Proceedings of the 33rd annual international symposium on Computer Architecture
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A novel dimensionally-decomposed router for on-chip communication in 3D architectures
Proceedings of the 34th annual international symposium on Computer architecture
Tightly-Coupled Multi-Layer Topologies for 3-D NoCs
ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
MIRA: A Multi-layered On-Chip Interconnect Router Architecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Recently, a near-optimal oblivious routing algorithm for 3D mesh networks called Randomized Partially-Minimal (RPM) routing was proposed [12], which works by load-balancing traffic across vertical layers and routing minimally on each horizontal layer. It achieves optimal worst-case throughput when the network radix k is even and within a factor of 1/k2 of optimal when k is odd, and it achieves significantly lower latencies than Valiant routing [18], the best previously known optimal worst-case throughput algorithm. This paper presents a novel layer-multiplexed (LM) architecture for 3D on-chip networks that exploits the optimality of RPM together with the short inter-layer wiring delays enabled in 3D technology. The LM architecture replaces the one-layer-per-hop routing in a 3D mesh with simpler vertical demultiplexing and multiplexing structures. The proposed LM architecture can achieve the same worst-case throughput as a 3D mesh by adapting RPM routing to the LM architecture. However, the LM architecture consumes 27% less power, occupies 27% less area, attains 14.5% higher average throughput, and achieves 33% lower worst-case hop count for a symmetric 4x4x4 mesh topology. On an asymmetric 8 x 8 x 4 mesh, the LM architecture achieves comparable average-case throughput to a 3D mesh, but consumes 26% less power, takes up 27% less area and attains 20% lower worst-case hop count.